Table of Contents
Fetching ...

Conditional gene genealogies given the population pedigree for a diploid Moran model with selfing

Maximillian Newman, John Wakeley, Wai-Tong Louis Fan

TL;DR

This work develops a conditional coalescent framework for a diploid Moran population with selfing, demonstrating that conditioning on the population pedigree yields three distinct limiting regimes as population size grows: negligible outcrossing, limited outcrossing described by an ancestral graph and random-walk meeting times, and partial selfing where familiar Kingman-like dynamics re-emerge. The authors introduce a detailed Moran-based pedigree model, derive the unconditional distribution of pairwise coalescence times, and prove conditional limit theorems for two-gene samples under each regime, including both samples from different individuals and from the same individual. They further extend the analysis to larger samples via conjectured limits and quantify how pedigree structure induces variance components and covariances in coalescence times, linking these to identity disequilibrium and multi-locus variation. The results emphasize that pedigree-informed coalescent models can capture genome-wide heterogeneity in genealogies and offer a principled alternative to pedigree-averaged coalescents for interpreting multi-locus data in populations with substantial selfing.

Abstract

We introduce a stochastic model of a population with overlapping generations and arbitrary levels of self-fertilization versus outcrossing. We study how the global graph of reproductive relationships, or population pedigree, influences the genealogical relationships of a sample of two gene copies at a genetic locus. Specifically, we consider a diploid Moran model with constant population size $N$ over time, in which a proportion of offspring are produced by selfing. We show that the conditional distribution of the pairwise coalescence time at a single locus given the random pedigree converges to a limit law as $N$ tends to infinity. The distribution of coalescence times obtained in this way predicts variation among unlinked loci in a sample of individuals. Traditional coalescent analyses implicitly average over pedigrees and generally make different predictions. We describe three different behaviors in the limit depending on the relative strengths, from large to small, of selfing versus outcrossing: partial selfing, limited outcrossing, and negligible outcrossing. In the case of partial selfing, coalescence times are related to the Kingman coalescent, similar to what is found in traditional analyses. In the case of limited outcrossing, the retained pedigree information forms a random graph, with coalescence times given by the meeting times of random walks on this graph. In the case of negligible outcrossing, which represents complete or nearly complete selfing, coalescence times are determined entirely by the fixed times to common ancestry of diploid individuals in the pedigree.

Conditional gene genealogies given the population pedigree for a diploid Moran model with selfing

TL;DR

This work develops a conditional coalescent framework for a diploid Moran population with selfing, demonstrating that conditioning on the population pedigree yields three distinct limiting regimes as population size grows: negligible outcrossing, limited outcrossing described by an ancestral graph and random-walk meeting times, and partial selfing where familiar Kingman-like dynamics re-emerge. The authors introduce a detailed Moran-based pedigree model, derive the unconditional distribution of pairwise coalescence times, and prove conditional limit theorems for two-gene samples under each regime, including both samples from different individuals and from the same individual. They further extend the analysis to larger samples via conjectured limits and quantify how pedigree structure induces variance components and covariances in coalescence times, linking these to identity disequilibrium and multi-locus variation. The results emphasize that pedigree-informed coalescent models can capture genome-wide heterogeneity in genealogies and offer a principled alternative to pedigree-averaged coalescents for interpreting multi-locus data in populations with substantial selfing.

Abstract

We introduce a stochastic model of a population with overlapping generations and arbitrary levels of self-fertilization versus outcrossing. We study how the global graph of reproductive relationships, or population pedigree, influences the genealogical relationships of a sample of two gene copies at a genetic locus. Specifically, we consider a diploid Moran model with constant population size over time, in which a proportion of offspring are produced by selfing. We show that the conditional distribution of the pairwise coalescence time at a single locus given the random pedigree converges to a limit law as tends to infinity. The distribution of coalescence times obtained in this way predicts variation among unlinked loci in a sample of individuals. Traditional coalescent analyses implicitly average over pedigrees and generally make different predictions. We describe three different behaviors in the limit depending on the relative strengths, from large to small, of selfing versus outcrossing: partial selfing, limited outcrossing, and negligible outcrossing. In the case of partial selfing, coalescence times are related to the Kingman coalescent, similar to what is found in traditional analyses. In the case of limited outcrossing, the retained pedigree information forms a random graph, with coalescence times given by the meeting times of random walks on this graph. In the case of negligible outcrossing, which represents complete or nearly complete selfing, coalescence times are determined entirely by the fixed times to common ancestry of diploid individuals in the pedigree.

Paper Structure

This paper contains 26 sections, 31 theorems, 114 equations, 8 figures.

Key Result

Theorem 3.1

Suppose $\alpha_N \rightarrow \alpha \in[0,1]$ as $N\to\infty$. Then $N^{-2}\tau^{(N)}$ converges in distribution: for any fixed $t \geq 0$

Figures (8)

  • Figure 1: Cumulative distribution functions of the coalescence times for a sample of size $n=2$ on each of $50$ population pedigrees simulated under the Wright-Fisher model with partial selfing and $N=1000$ individuals. Panels (a), (b) and (c) show the results for three different values of the probability of selfing, respectively, for $\alpha\in\{0.9,0.99,0.999\}$. Coalescence probabilities in each generation given the pedigree and the sampled individuals were calculated exactly, up to numerical precision
  • Figure 2: A realization of our diploid Moran process with $N = 6$ individuals including the genetic transmission events at one locus (left image) and the corresponding pedigree (right image). Also on the left, two gene copies $(X_0, Y_0) = (2, 9)$ are sampled in the present time-step $0$, and their lineages are highlighted by the solid dots. Namely, $\{(X_k, Y_k)\}_{k=0}^4=\{(2,9), (2,9), (2,5), (6,5), (4,4)\}$. These two lineages coalesce in time-step $4$ because $X_4=Y_4$ but $X_k\neq Y_k$ for $k=0,1,2,3$.
  • Figure 3: The limiting conditional CDF $t\mapsto \lim_{N\to\infty}\mathbb{P}_{{\rm diff}}(N^{-2}\tau^{(N)} \leq t | \mathcal{A}_N)$ for our Moran model under three assumptions about $\alpha_N$: partial selfing, limited outcrossing, and negligible outcrossing. (Left) Partial selfing, where $\lim_{N} N(1-\alpha_N)=\infty$. The limiting CDF is deterministic and is exactly the CDF of an exponential random variable ${\rm Exp}\left(\frac{2}{2-\alpha}\right)$, with $\alpha=1$ for comparison with those at center and right. (Center) Five realizations of the limiting CDF for $\lambda = 5$ under limited outcrossing, i.e. with $\lim_{N} N(1-\alpha_N)=\lambda = 5$. The limiting CDF is random and is described precisely in Theorem \ref{['T:MAIN_conditional']}. (Right) Five realizations of the limiting CDF for the case of negligible outcrossing, where $\lim_{N} N(1-\alpha_N)=0$. The limiting CDF is random, and it is the CDF of a random constant that is exponentially distributed with rate 2, i.e. the CDF is a Heaviside function with exponentially distributed jump time.
  • Figure 4: (Left) A realization of the random ancestral graph $G_{\lambda}$ starting with two particles/nodes. For this particular realization, the overlap times of the sample lineages are $t_1< t_2< t_3$. (Right) The CDF corresponding to this $G_\lambda$. The conditional distribution of the limiting coalescence time $T_\lambda$, given $G_\lambda$, is obtained by tracing ancestral genetic lineages backwards in time along the graph $G_\lambda$. Given this $G_\lambda$, $T_\lambda$ must take values in $\{t_1, t_2, t_3\}$. One can read off the conditional CDF of $T_\lambda$ as follows, by tracing the ancestries of a hypothetically infinite number of unlinked loci. The node of $G_{\lambda}$ at time $t_1$ contains $3/4$ of the ancestries from the right sample lineage and the whole of those from the left sample lineage, so $\mathbb{P}(T_\lambda=t_1 | G_\lambda ) = 3/4$. Between $t_1$ and $t_2$, half of the loci following the left ancestral lineage go left at the split and the other $1/2$ continue up. The lineage on the right between $t_1$ and $t_2$ contains $1/4$ of the ancestries from the right sample lineage. So when these two lineages overlap at $t_2$ we have $\mathbb{P}(T_\lambda = t_2 | G_\lambda) = 1/8$. Finally the remainder of the right sample lineage meets the remainder of the left sample lineage at $t_3$ for $\mathbb{P}(T_\lambda = t_3) = 1/8$.
  • Figure 5: A realization of $G^N$ containing potential ancestors of the two sampled individuals (ovals) at the bottom. In past time-step $3$, the ancestral individual on the left undergoes an outcrossing event. In past time-step $4$, one of the nodes from this event is the offspring of one of the potential ancestral individuals of the sampled individual on the right, and these two nodes coagulate. In the fifth time-step, the right most particle experiences an outcrossing in which exactly one of the parents is a node in the graph. This may appear in the discrete-time ancestral graph but will not occur in the limiting ancestral graph.
  • ...and 3 more figures

Theorems & Definitions (46)

  • Remark 2.1
  • Theorem 3.1: Unconditional limiting distribution
  • Remark 3.2
  • Theorem 4.1
  • Remark 4.2
  • Definition 4.3: The ancestral graph $G_\lambda$
  • Definition 4.4: The coalescence time $T_\lambda$
  • Theorem 4.5
  • Conjecture 4.6
  • Lemma 5.1
  • ...and 36 more