Probability that an allele frequency fixes in Wright-Fisher genetic drift

by greenturtle3141, May 5, 2022, 2:38 AM

Introduction

Reading Level: 2-3/5

Let's say there's a population of 100 cats, and:
  • 50 of them have brown eyes (homozygous dominant),
  • 30 of them also have brown eyes (heterozygous dominant), and
  • 20 of them have blue eyes (homozygous recessive).
What's the probability that eventually, after many generations, all the cats will end up having blue eyes?

The answer, by some miracle, is that it's precisely the proportion of the cats' alleles that are the blue-eyes allele. As a reminder, a trait is determined by two alleles, and e.g. in this case is written as either BB, Bb, or bb where B is the dominant allele (here, the brown eyes trait) and b is the recessive allele (here, the blue eyes trait). BB and Bb both result in brown eyes, whereas bb results in blue eyes.

So in total there are $200$ alleles. The $20$ homozygous blue-eyed cats contribute $40$ of the b alleles, and the $30$ heterozygous brown-eyed cats contribute $30$ of the b alleles. Hence the frequency of the b allele in the population of cats is $70/200$. Thus, by some miracle, the probability that all the cats end up blue-eyed after a bunch of generations is essentially just $70/200 = \boxed{35\%}$.

To prove this, we first need to formalize how exactly we are modelling the genetic drift, i.e. how the frequency of the alleles changes.

The Wright-Fisher Model
  • (Assumptions) We assume that we begin with $N$ alleles (realistically, $N$ would be even). Each allele is either $A$ or $B$. We don't care which is dominant and which is recessive, nor do we care how exactly they are paired up (this induces some inaccuracy, but whatever). We maintain that every generation will continue to have $N$ alleles, determined by the previous generation.
  • ("Base Case") On generation $0$, we say that $X_0 := pN$ of the alleles are $A$, and the others are all $B$.
  • (Drift Mechanism) If on generation $n$, we have that $X_n$ of the alleles are $A$, then each of the $N$ alleles in generation $n+1$ is decided, independently of the others, by essentially choosing (uniformly) a random allele sample from generation $n$.

    Hence a particular allele in generation $n+1$ has $X_n/N$ probability of being $A$ and $1-X_n/N$ probability of being $B$. In particular, the probability that generation $n+1$ will have $k$ alleles being $A$ is given by:
    $$\mathbb{P}(X_{n+1} = k) = \binom{N}{k}\left(\frac{X_n}{N}\right)^k\left(1-\frac{X_n}{N}\right)^{N-k}$$So that essentially, given what $X_n$ is, we have that $X_{n+1}$ has a Binomial distribution of parameter $X_n/N$.

Under this model, we want to prove the following theorem:

THEOREM: The probability that eventually $X_n$ will become fixed/constant at $X_n=N$ (i.e. all alleles become $A$) is exactly $X_0/N = p$, the initial frequency of the $A$ allele.

Proof

Reading Level: 6/5
Prerequisites: I hope you know what a martingale is

CLAIM: The stochastic process $X_n$ is a martingale with respect to the natural filtration $\mathcal{F}_n := \sigma(X_0,X_1,\cdots,X_n)$.

Proof. Obviously $X_n$ is $\mathcal{F}_n$-measurable. Moreover $\mathbb{E}|X_n| \leq N < \infty$ for all $n$, giving integrability. It remains to show that $\mathbb{E}(X_{n+1} | \mathcal{F}_n) = X_n$. Indeed, we may compute:
$$\mathbb{E}(X_{n+1} | \mathcal{F}_n) = \sum_{k=0}^N \binom{N}{k}\left(\frac{X_n}{N}\right)^k\left(1-\frac{X_n}{N}\right)^{N-k} \cdot k$$$$ = \sum_{k=0}^N \frac{N!}{k!(N-k)!}\left(\frac{X_n}{N}\right)^k\left(1-\frac{X_n}{N}\right)^{N-k} \cdot k$$$$ = \sum_{k=1}^N \frac{N!}{(k-1)!(N-k)!}\left(\frac{X_n}{N}\right)^k\left(1-\frac{X_n}{N}\right)^{N-k}$$$$ = N\sum_{k=1}^N \frac{(N-1)!}{(k-1)!(N-k)!}\left(\frac{X_n}{N}\right)^k\left(1-\frac{X_n}{N}\right)^{N-k}$$$$ = N\sum_{k=1}^N \binom{N-1}{k-1}\left(\frac{X_n}{N}\right)^k\left(1-\frac{X_n}{N}\right)^{N-k}$$$$ = N\sum_{k=0}^{N-1} \binom{N-1}{k}\left(\frac{X_n}{N}\right)^{k+1}\left(1-\frac{X_n}{N}\right)^{N-1-k}$$$$ = N \cdot \frac{X_n}{N}\sum_{k=0}^{N-1} \binom{N-1}{k}\left(\frac{X_n}{N}\right)^k\left(1-\frac{X_n}{N}\right)^{N-1-k}$$$$ = X_n\left(\frac{X_n}{N} + 1-\frac{X_n}{N}\right)^{N-1}$$$$ = X_n$$Amazing. $\square$.

We now define the stopping time $\tau = \inf\{n : X_n \in \{0,N\}\}$.

CLAIM: $\tau < \infty$ almost surely.

Proof.

Whether $X_0 \in \{0,N\}$ or not, we must have that $\mathbb{P}(\tau = 1) \geq \frac{1}{N^N}$. That is, there's always a chance (however small) that the next generation keeps picking the same allele again and again. Then $\mathbb{P}(\tau > 1) \leq 1-\frac{1}{N^N}$. More generally, for the same reason, we have that $\mathbb{P}(\tau > n+1 | \tau > n) \leq 1-\frac{1}{N^N}$ and so:
$$\mathbb{P}(\tau > n) = \mathbb{P}(\tau>1)\prod_{k=1}^{n-1}\mathbb{P}(\tau > k+1 | \tau > k) \leq \left(1-\frac{1}{N^N}\right)^n$$Summing:
$$\sum_{n=0}^\infty \mathbb{P}(\tau > n) \leq \sum_{n=0}^\infty \left(1-\frac{1}{N^N}\right)^n < \infty$$But the LHS is precisely $\mathbb{E} \tau$, so $\mathbb{E} \tau < \infty$ which gives, in particular, that $\tau < \infty$ almost surely. $\square$

At this point, we have established many powerful facts, many of which can be used to finish the proof. In particular we shall choose the properties that $\tau < \infty$ almost surely and that $X$ is a non-negative martingale. It then follows by Doob's Optional Stopping Theorem that $\boxed{\mathbb{E}X_\tau = \mathbb{E}X_0}$. But $X_\tau \in \{0,N\}$, thus $N\mathbb{P}(X_\tau = N) = \mathbb{E}X_\tau$. And, $\mathbb{E}X_0 = pN$, since $X_0$ is just constant and equal to $pN$ by definition. Thus:
$$N\mathbb{P}(X_\tau = N) = pN$$$$\mathbb{P}(X_\tau = N) = p$$Which is exactly what we wanted to show. $\square$

What the heck just happened???

Reading Difficulty: 0-1/5

Here is the proof in layman's terms.

The number blue alleles, over time, actually isn't really biased to go up over time or down over time. Even if almost all the alleles are blue, this is still true. This would be because, although it's likely that the number of blue alleles will go up a bit, there are small chances that the number of blue alleles can go down by a lot, which balances things out.

In other words, the mean or average number of blue alleles, over all the possible futures, will always keep staying the same, because at any point in time, the number of blue alleles isn't biased to go up or down.

The mean fraction of blue alleles at the start of time is just the starting fraction of blue alleles. This is because we're starting with only one possibility, and nothing has happened yet.

The mean fraction of blue alleles at the end of time is equal to the chance that all the alleles become blue. This is because at the end of time, every possibility ends up with all blue alleles or all brown alleles, and the chance that all the alleles are blue is just the fraction of these possibilities where all the alleles are blue. This is equal to the mean fraction of blue alleles.

These fractions are equal, so the chance that all the alleles end up being blue, after a bunch of generations, is just the starting fraction of blue alleles.
This post has been edited 9 times. Last edited by greenturtle3141, May 5, 2022, 3:50 AM

Comment

1 Comment

The post below has been deleted. Click to close.
This post has been deleted. Click here to see post.
this is pretty cool

by aayr, Mar 15, 2024, 5:41 AM

Turtle math!

avatar

greenturtle3141
Archives
+ October 2024
Shouts
Submit
  • Can you give some thought to dropping a guide to STS? Just like how you presented your research (in your paper), what your essays were about, etc. Also cool blog!

    by Shreyasharma, Mar 13, 2025, 7:03 PM

  • this is so good

    by purpledonutdragon, Mar 4, 2025, 2:05 PM

  • orz usamts grader

    by Lhaj3, Jan 23, 2025, 7:43 PM

  • Entertaining blog

    by eduD_looC, Dec 31, 2024, 8:57 PM

  • wow really cool stuff

    by kingu, Dec 4, 2024, 1:02 AM

  • Although I had a decent college essay, this isn't really my specialty so I don't really have anything useful to say that isn't already available online.

    by greenturtle3141, Nov 3, 2024, 7:25 PM

  • Could you also make a blog post about college essay writing :skull:

    by Shreyasharma, Nov 2, 2024, 9:04 PM

  • what gold

    by peace09, Oct 15, 2024, 3:39 PM

  • oh lmao, i was confused because of the title initially. thanks! great read

    by OlympusHero, Jul 20, 2024, 5:00 AM

  • It should be under August 2023

    by greenturtle3141, Jul 11, 2024, 11:44 PM

  • does this blog still have the post about your math journey? for some reason i can't find it

    by OlympusHero, Jul 10, 2024, 5:41 PM

  • imagine not tortoise math

    no but seriously really interesting blog

    by fruitmonster97, Apr 2, 2024, 12:39 AM

  • W blog man

    by s12d34, Jan 24, 2024, 11:37 PM

  • very nice blog greenturtle it is very descriptive and fascinating to pay attention to :-D

    by StarLex1, Jan 3, 2024, 3:12 PM

  • orz blog

    by ryanbear, Dec 6, 2023, 9:23 PM

67 shouts
Tags
About Owner
  • Posts: 3555
  • Joined: Oct 14, 2014
Blog Stats
  • Blog created: Oct 23, 2021
  • Total entries: 54
  • Total visits: 41096
  • Total comments: 126
Search Blog
a