Table of Contents
Fetching ...

A conditional coalescent for diploid exchangeable population models given the pedigree

Frederic Alberti, Matthias Birkner, Wai-Tong Louis Fan, John Wakeley

TL;DR

This work analyzes gene genealogies conditional on a fixed population pedigree under the diploid Cannings model, revealing that, in the large-$N$ limit, the quenched limiting process can differ dramatically from the marginal coalescent when multiple mergers are possible. The authors develop an inhomogeneous $(\Psi,c)$-coalescent to capture timeline-structured large-family events (GLIPs) via a Poisson point process $\Psi$ with intensity $d t\,\Xi(d x)/\langle x,x\rangle$ and a constant pair-merger rate $c_{\text{pair}}=1-\Xi(\Delta\setminus\{0\})$, with time rescaled by $c_N$ to unit scale. A key methodological contribution is a coupling framework for two coalescents on the same pedigree and a coarse-graining argument that reduces the pedigree to a paintbox-driven mechanism for GLIPs, enabling a rigorous convergence to the inhomogeneous coalescent. The results show fundamental differences between quenched and annealed genealogies, with concrete implications for the site-frequency spectrum and multi-locus statistics, and provide a suite of examples (Wright–Fisher, random fitness, occasional large families) along with simulations illustrating pedigree-driven variation in genetic data. These findings have practical impact on inference and simulation in populations with highly skewed reproductive success, and they extend prior work by incorporating arbitrary sample sizes and a full pedigree-conditioned limiting process. The framework paves the way for further extensions to recombination, sex structure, and forward-time duals.

Abstract

We study coalescent processes conditional on the population pedigree under the exchangeable diploid bi-parental population model of \citet{BirknerEtAl2018}. While classical coalescent models average over all reproductive histories, thereby marginalizing the pedigree, our work analyzes the genealogical structure embedded within a fixed pedigree generated by the diploid Cannings model. In the large-population limit, we show that these conditional coalescent processes differ significantly from their marginal counterparts when the marginal coalescent process includes multiple mergers. We characterize the limiting process as an inhomogeneous $(Ψ,c)$-coalescent, where $Ψ$ encodes the timing and scale of multiple mergers caused by generations with large individual progeny (GLIPs), and $c$ is a constant rate governing binary mergers. Our results reveal fundamental distinctions between quenched (conditional) and annealed (classical) genealogical models, demonstrate how the fixed pedigree structure impacts multi-locus statistics such as the site-frequency spectrum, and have implications for interpreting patterns of genetic variation among unlinked loci in the genomes of sampled individuals. They significantly extend the results of \citet{DiamantidisEtAl2024}, which considered a sample of size two under a specific Wright-Fisher model with a highly reproductive couple, and those of \citet{TyukinThesis2015}, where Kingman coalescent was the limiting process. Our proofs adapt coupling techniques from the theory of random walks in random environments.

A conditional coalescent for diploid exchangeable population models given the pedigree

TL;DR

This work analyzes gene genealogies conditional on a fixed population pedigree under the diploid Cannings model, revealing that, in the large- limit, the quenched limiting process can differ dramatically from the marginal coalescent when multiple mergers are possible. The authors develop an inhomogeneous -coalescent to capture timeline-structured large-family events (GLIPs) via a Poisson point process with intensity and a constant pair-merger rate , with time rescaled by to unit scale. A key methodological contribution is a coupling framework for two coalescents on the same pedigree and a coarse-graining argument that reduces the pedigree to a paintbox-driven mechanism for GLIPs, enabling a rigorous convergence to the inhomogeneous coalescent. The results show fundamental differences between quenched and annealed genealogies, with concrete implications for the site-frequency spectrum and multi-locus statistics, and provide a suite of examples (Wright–Fisher, random fitness, occasional large families) along with simulations illustrating pedigree-driven variation in genetic data. These findings have practical impact on inference and simulation in populations with highly skewed reproductive success, and they extend prior work by incorporating arbitrary sample sizes and a full pedigree-conditioned limiting process. The framework paves the way for further extensions to recombination, sex structure, and forward-time duals.

Abstract

We study coalescent processes conditional on the population pedigree under the exchangeable diploid bi-parental population model of \citet{BirknerEtAl2018}. While classical coalescent models average over all reproductive histories, thereby marginalizing the pedigree, our work analyzes the genealogical structure embedded within a fixed pedigree generated by the diploid Cannings model. In the large-population limit, we show that these conditional coalescent processes differ significantly from their marginal counterparts when the marginal coalescent process includes multiple mergers. We characterize the limiting process as an inhomogeneous -coalescent, where encodes the timing and scale of multiple mergers caused by generations with large individual progeny (GLIPs), and is a constant rate governing binary mergers. Our results reveal fundamental distinctions between quenched (conditional) and annealed (classical) genealogical models, demonstrate how the fixed pedigree structure impacts multi-locus statistics such as the site-frequency spectrum, and have implications for interpreting patterns of genetic variation among unlinked loci in the genomes of sampled individuals. They significantly extend the results of \citet{DiamantidisEtAl2024}, which considered a sample of size two under a specific Wright-Fisher model with a highly reproductive couple, and those of \citet{TyukinThesis2015}, where Kingman coalescent was the limiting process. Our proofs adapt coupling techniques from the theory of random walks in random environments.

Paper Structure

This paper contains 35 sections, 13 theorems, 191 equations, 4 figures.

Key Result

Lemma 2.5

The processes $(X_j (g))_{g \in \mathbb{N}_0}$ for $j\in [n]$ form a family of coalescing random walks on $\{0,1\} \times [N]$, whose transition probabilities, conditional on the pedigree$({P_{0}^{(N)},P_{1}^{(N)}})$, are given by for $c\in\{0,1\}$ and $g \in \mathbb{N}_0$. For any choice of indices $i_1^{}, \ldots, i_r^{} \in [n]$ such that $X_{i_1^{}}^{} (g),\ldots,X_{i_r^{}} (g)$ are pairwise

Figures (4)

  • Figure 1: Two different genealogies on the same pedigree. The two figures above are two realizations of the Mendelian coin flips in the population of $N=6$ individuals with the same pedigree. Consider a sample of $n=3$ genes at generation $g=0$. The sample $(X_i(0) )_{1\leqslant i \leqslant3}$ is described as follows: $X_1(0)=(0,2)$ is the 0-gene of the second individual from the left, and $X_2(0)=(0,5)$ and $X_3(0)=(1,5)$ both belong to the fifth individual. The ancestral process $\Pi^{6,3}$ in Definition \ref{['def:Pi^N.n']} is shown by the thick edges, with $\Pi^{6,3}_0=\{ \{1\},\{\{2\}, \{3\}\}\}\in \mathcal{S}_3$. Our main question concerns the conditional distribution of the ancestral process for a sample given the pedigree.
  • Figure 2: Log base $10$ of the expected SFS, specifically the expected proportion of polymorphic sites for each possible count of the mutant base in a sample of size $n=100$, for three values of $\psi\in\{0.1,0.5,0.9\}$ under the $\delta_{(\frac{\psi}{4},\frac{\psi}{4},0,0,\cdots)}$ model with a Kingman component, in which $\lambda$ is the rate of $\delta_{(\frac{\psi}{4},\frac{\psi}{4},0,0,\cdots)}$ events relative to Kingman evens. In the top row ($\lambda=10^6$) large reproduction events dominate Kingman events. In the bottom row ($\lambda=1.0$) both occur at the same rate. The five lines in each panel are the SFS for five independent pedigrees (lists of times of large families). For each of these, the SFS was estimated from the simulated gene genealogies of one million unlinked loci.
  • Figure 3: Effects of pedigrees on the total length of the gene genealogy for $n=100$ under the $\delta_{(\frac{1}{4},\frac{1}{4},0,0,\ldots)}$ model with a negligible Kingman component ($\lambda=10^6$). Left: Relative importance of the two sources of variation in $T_\textrm{total}$ among loci from \ref{['eq:totalVarTtotal']} expressed as fractions of the total variance, for values of $\psi$ ranging from $0.025$ to $1$. The contributions sum to one, but both are displayed for illustration. Right: Distributions of $T_\textrm{total}$ among $50000$ unlinked loci given each of two randomly generated pedigrees for the case $\psi=1$, compared to the corresponding distribution (in blue) for the annealed model, i.e. the distribution of $T_\textrm{total}$ at a single locus among $50000$ randomly generated pedigrees.
  • Figure :

Theorems & Definitions (57)

  • Definition 2.1
  • Definition 2.2
  • Definition 2.3
  • Definition 2.4
  • Lemma 2.5
  • proof
  • Remark 2.6
  • Remark 2.7
  • Definition 2.8
  • Remark 3.1: Two-sex model
  • ...and 47 more