Probability that an allele frequency fixes in Wright-Fisher genetic drift
by greenturtle3141, May 5, 2022, 2:38 AM
Introduction
Reading Level: 2-3/5
Let's say there's a population of 100 cats, and:
The answer, by some miracle, is that it's precisely the proportion of the cats' alleles that are the blue-eyes allele. As a reminder, a trait is determined by two alleles, and e.g. in this case is written as either BB, Bb, or bb where B is the dominant allele (here, the brown eyes trait) and b is the recessive allele (here, the blue eyes trait). BB and Bb both result in brown eyes, whereas bb results in blue eyes.
So in total there are
alleles. The
homozygous blue-eyed cats contribute
of the b alleles, and the
heterozygous brown-eyed cats contribute
of the b alleles. Hence the frequency of the b allele in the population of cats is
. Thus, by some miracle, the probability that all the cats end up blue-eyed after a bunch of generations is essentially just
.
To prove this, we first need to formalize how exactly we are modelling the genetic drift, i.e. how the frequency of the alleles changes.
The Wright-Fisher Model
Under this model, we want to prove the following theorem:
THEOREM: The probability that eventually
will become fixed/constant at
(i.e. all alleles become
) is exactly
, the initial frequency of the
allele.
Proof
Reading Level: 6/5
Prerequisites: I hope you know what a martingale is
CLAIM: The stochastic process
is a martingale with respect to the natural filtration
.
Proof. Obviously
is
-measurable. Moreover
for all
, giving integrability. It remains to show that
. Indeed, we may compute:








Amazing.
.
We now define the stopping time
.
CLAIM:
almost surely.
Proof.
Whether
or not, we must have that
. That is, there's always a chance (however small) that the next generation keeps picking the same allele again and again. Then
. More generally, for the same reason, we have that
and so:
Summing:
But the LHS is precisely
, so
which gives, in particular, that
almost surely. 
At this point, we have established many powerful facts, many of which can be used to finish the proof. In particular we shall choose the properties that
almost surely and that
is a non-negative martingale. It then follows by Doob's Optional Stopping Theorem that
. But
, thus
. And,
, since
is just constant and equal to
by definition. Thus:

Which is exactly what we wanted to show. 
What the heck just happened???
Reading Difficulty: 0-1/5
Here is the proof in layman's terms.
The number blue alleles, over time, actually isn't really biased to go up over time or down over time. Even if almost all the alleles are blue, this is still true. This would be because, although it's likely that the number of blue alleles will go up a bit, there are small chances that the number of blue alleles can go down by a lot, which balances things out.
In other words, the mean or average number of blue alleles, over all the possible futures, will always keep staying the same, because at any point in time, the number of blue alleles isn't biased to go up or down.
The mean fraction of blue alleles at the start of time is just the starting fraction of blue alleles. This is because we're starting with only one possibility, and nothing has happened yet.
The mean fraction of blue alleles at the end of time is equal to the chance that all the alleles become blue. This is because at the end of time, every possibility ends up with all blue alleles or all brown alleles, and the chance that all the alleles are blue is just the fraction of these possibilities where all the alleles are blue. This is equal to the mean fraction of blue alleles.
These fractions are equal, so the chance that all the alleles end up being blue, after a bunch of generations, is just the starting fraction of blue alleles.
Reading Level: 2-3/5
Let's say there's a population of 100 cats, and:
- 50 of them have brown eyes (homozygous dominant),
- 30 of them also have brown eyes (heterozygous dominant), and
- 20 of them have blue eyes (homozygous recessive).
The answer, by some miracle, is that it's precisely the proportion of the cats' alleles that are the blue-eyes allele. As a reminder, a trait is determined by two alleles, and e.g. in this case is written as either BB, Bb, or bb where B is the dominant allele (here, the brown eyes trait) and b is the recessive allele (here, the blue eyes trait). BB and Bb both result in brown eyes, whereas bb results in blue eyes.
So in total there are







To prove this, we first need to formalize how exactly we are modelling the genetic drift, i.e. how the frequency of the alleles changes.
The Wright-Fisher Model
- (Assumptions) We assume that we begin with
alleles (realistically,
would be even). Each allele is either
or
. We don't care which is dominant and which is recessive, nor do we care how exactly they are paired up (this induces some inaccuracy, but whatever). We maintain that every generation will continue to have
alleles, determined by the previous generation.
- ("Base Case") On generation
, we say that
of the alleles are
, and the others are all
.
- (Drift Mechanism) If on generation
, we have that
of the alleles are
, then each of the
alleles in generation
is decided, independently of the others, by essentially choosing (uniformly) a random allele sample from generation
.
Hence a particular allele in generationhas
probability of being
and
probability of being
. In particular, the probability that generation
will have
alleles being
is given by:
So that essentially, given what
is, we have that
has a Binomial distribution of parameter
.
Under this model, we want to prove the following theorem:
THEOREM: The probability that eventually





Proof
Reading Level: 6/5
Prerequisites: I hope you know what a martingale is
CLAIM: The stochastic process


Proof. Obviously















We now define the stopping time

CLAIM:

Proof.
Whether










At this point, we have established many powerful facts, many of which can be used to finish the proof. In particular we shall choose the properties that











What the heck just happened???
Reading Difficulty: 0-1/5
Here is the proof in layman's terms.
The number blue alleles, over time, actually isn't really biased to go up over time or down over time. Even if almost all the alleles are blue, this is still true. This would be because, although it's likely that the number of blue alleles will go up a bit, there are small chances that the number of blue alleles can go down by a lot, which balances things out.
In other words, the mean or average number of blue alleles, over all the possible futures, will always keep staying the same, because at any point in time, the number of blue alleles isn't biased to go up or down.
The mean fraction of blue alleles at the start of time is just the starting fraction of blue alleles. This is because we're starting with only one possibility, and nothing has happened yet.
The mean fraction of blue alleles at the end of time is equal to the chance that all the alleles become blue. This is because at the end of time, every possibility ends up with all blue alleles or all brown alleles, and the chance that all the alleles are blue is just the fraction of these possibilities where all the alleles are blue. This is equal to the mean fraction of blue alleles.
These fractions are equal, so the chance that all the alleles end up being blue, after a bunch of generations, is just the starting fraction of blue alleles.
This post has been edited 9 times. Last edited by greenturtle3141, May 5, 2022, 3:50 AM