Weak convergence of the scaled jump chain and number of mutations of the Kingman coalescent
Martina Favero, Henrik Hult
TL;DR
This work analyzes the large-sample asymptotics of the Kingman coalescent with a finite-allele mutation scheme under neutrality by studying the joint block-counting and mutation-counting process $\mathbf{Z}^{(n)}=(\mathbf{Y}^{(n)},\mathbf{M}^{(n)})$. The authors prove that, after appropriate scaling, $\mathbf{Z}^{(n)}$ converges to a deterministic path $\mathbf{Y}(s)$ together with independent time-inhomogeneous Poisson mutation-counting processes $\mathbf{M}$ with intensities $\lambda_{ij}(\mathbf{Y}(s))= \frac{\theta P_{ij} Y_i(s)}{\|\mathbf{Y}(s)\|_1^2}$, first under parent-independent mutation (PIM) and then for general mutation via a novel change-of-measure technique. The change-of-measure framework uses Radon–Nikodym factors $r^{(n)}_{P,Q}$ and $c_{P,Q}$ to transfer convergence results from the PIM setting to the general mutation setting, supported by a technical Ethier–Kurtz–type framework to handle explosion near the boundary. The results provide a rigorous basis for large-sample inference in population genetics with general mutation schemes and illuminate the joint behavior of lineage counts and mutation counts in the Kingman coalescent limit.
Abstract
The Kingman coalescent is a fundamental process in population genetics modelling the ancestry of a sample of individuals backwards in time. In this paper, in a large-sample-size regime, we study asymptotic properties of the coalescent under neutrality and a general finite-alleles mutation scheme, i.e. including both parent independent and parent dependent mutation. In particular, we consider a sequence of Markov chains that is related to the coalescent and consists of block-counting and mutation-counting components. We show that these components, suitably scaled, converge weakly to deterministic components and Poisson processes with varying intensities, respectively. Along the way, we develop a novel approach, based on a change of measure, to generalise the convergence result from the parent independent to the parent dependent mutation setting, in which several crucial quantities are not known explicitly.
