Table of Contents
Fetching ...

A master equation approach to the n-coalescent problem

Bahram Houchmandzadeh

TL;DR

The paper addresses the n-coalescent problem by reframing it in terms of the joined state $(n,t)$ and the probability $P(n,t)$, from which the MRCA distribution and tree-width statistics can be derived. It develops a master equation with a triangular jump generator $W$, enabling exact solutions for both continuous-time Moran and discrete-time Wright-Fisher models, and provides explicit expressions for $P(n,t)$, the MRCA CDF $F_1(t)=P(1,t)$, and moments of $t$ and $n(t)$ via recursion on an amplitude matrix $A$. The key contribution is a general, efficient framework that unifies continuous and discrete coalescent analyses and extends naturally to multi-step models, improving theoretical tractability and numerical computability. This approach offers a robust foundation for Coalescent computations and opens avenues for incorporating structure, such as the structured coalescent, using the same master-equation formalism.

Abstract

Given an evolutionary model, such as Wright--Fisher (WF) or Moran, the n-coalescent problem consists of going backward in time to find for example the time to the most recent common ancestor (MRCA) and the topology of the tree. In the literature, this problem is tackled mostly by computing directly the random variable t, time to reach the MRCA. I show here that by shifting the focus from the random variable t to the joined variable (n,t), where n is the number of ancestors at time t, the problem is greatly simplified. Indeed, P(n,t), the probability of this variable, obeys a simpler master equation that can be solved in a straightforward way for the most general model. This probability can then be used to compute relevant information of the n-coalescent, for both random variables $t_{n}$ (random time to reach a given state n) and $n_{t}$ (random number of ancestors at a given time t). The cumulative distribution function for $t_{1}$ for example is $P(1,t)$. I give in this article the general solution for continuous time models such as Moran and discrete time ones such as WF.

A master equation approach to the n-coalescent problem

TL;DR

The paper addresses the n-coalescent problem by reframing it in terms of the joined state and the probability , from which the MRCA distribution and tree-width statistics can be derived. It develops a master equation with a triangular jump generator , enabling exact solutions for both continuous-time Moran and discrete-time Wright-Fisher models, and provides explicit expressions for , the MRCA CDF , and moments of and via recursion on an amplitude matrix . The key contribution is a general, efficient framework that unifies continuous and discrete coalescent analyses and extends naturally to multi-step models, improving theoretical tractability and numerical computability. This approach offers a robust foundation for Coalescent computations and opens avenues for incorporating structure, such as the structured coalescent, using the same master-equation formalism.

Abstract

Given an evolutionary model, such as Wright--Fisher (WF) or Moran, the n-coalescent problem consists of going backward in time to find for example the time to the most recent common ancestor (MRCA) and the topology of the tree. In the literature, this problem is tackled mostly by computing directly the random variable t, time to reach the MRCA. I show here that by shifting the focus from the random variable t to the joined variable (n,t), where n is the number of ancestors at time t, the problem is greatly simplified. Indeed, P(n,t), the probability of this variable, obeys a simpler master equation that can be solved in a straightforward way for the most general model. This probability can then be used to compute relevant information of the n-coalescent, for both random variables (random time to reach a given state n) and (random number of ancestors at a given time t). The cumulative distribution function for for example is . I give in this article the general solution for continuous time models such as Moran and discrete time ones such as WF.

Paper Structure

This paper contains 16 sections, 66 equations, 12 figures.

Figures (12)

  • Figure 1: Illustration of a Moran coalescent tree (blue lines) with initial condition $n_{0}=15$ and population size $N=25$. Coalescent events are marked by black circles and the corresponding coalescent times by dotted horizontal lines. The red line $n(t)$ is a random path corresponding to the number of ancestors as a function of (backward) times for this particular realization.
  • Figure 2: Illustration of a Moran coalescent, with initial condition $n_{0}=15$ and population size $N=25$. Background thin lines represent Gillespie simulations of the random paths $n(t)$. Statistical properties of the coalescent process can be extracted from these random paths : (a) vertical slicing: at a given state $n$, the time $t_{n}$ for each path is recorded; from these times, statistical quantities such as $\left\langle t_{n}\right\rangle$, the average value of time to reach state $n$, are computed (here over 1000 paths); vertical bars denote the 10%-90% confidence interval. (b) Horizontal slicing : at a given time $t$, the state $n(t)$ for each path is recorded. From these states, statistical quantities such as $\left\langle n(t)\right\rangle$, the average value of the tree width (number of ancestor) as a function of time are computed; horizontal bars denote the 10%-90% confidence interval.
  • Figure 3: Scheme of the Moran model.
  • Figure 4: Graphical illustration of the construction of the triangular amplitude matrix $A$. Beginning with line $n_{0}$ where $A_{n_{0}}^{n_{0}}=1$, each row $n$ is obtained from the row below by multiplying the element by the factor $f_{k}=W_{n+1}/(W_{n}-W_{k})$. The diagonal element is obtained by summing the non--diagonal elements of each row and inverting the sign. The matrix has the following properties : each column (except the first one) and each row (except the last one) sums up to zero, and $A_{1}^{1}=1$.
  • Figure 5: Comparison of the theoretical expression (\ref{['eq:moran:direct:t;n']}) for the probabilities $P(n,t)$ to numerical simulations of the stochastic process using the Gillespie algorithm, for $N=50$ and $n_{0}=10$. The numerical probabilities are computed from $10^{5}$ random paths.
  • ...and 7 more figures