A Statistical Test for Comparing the Linkage and Admixture Model Based on Central Limit Theorems
Carola Sophia Heinzel
TL;DR
The work addresses how to decide between the Admixture model and the Linkage model for ancestry data by establishing consistency and central limit theorems for maximum likelihood estimators in the Linkage model, enabling a principled asymptotic likelihood-ratio test. It proves identifiability and asymptotic normality, derives the test statistic $\Lambda$ with a $\chi^2_1$ null distribution, and applies the method to simulated data and 1000 Genomes AIMs to illustrate per-individual model selection and marker-set assessment. The results provide a rigorous statistical foundation for model selection in ancestry inference under time-inhomogeneous HMMs and demonstrate practical implications for population genetics analyses. The approach enables quantified uncertainty of ancestry estimates and supports targeted data-driven decisions on marker sets and recombination-rate assumptions in real genomic datasets.
Abstract
In the Admixture Model, the probability that an individual carries a certain allele at a specific marker depends on the allele frequencies in $K$ ancestral populations and the proportion of the individual's genome originating from these populations. The markers are assumed to be independent. The Linkage Model is a Hidden Markov Model (HMM) that extends the Admixture Model by incorporating linkage between neighboring loci. We prove consistency and asymptotic normality of maximum likelihood estimators (MLEs) for the ancestry of individuals in the Linkage Model, complementing earlier results by \citep{pfaff2004information, pfaffelhuber2022central, HEINZEL2025} for the Admixture Model. These results are used to prove that a statistical test that allows for model selection between the Admixture Model and the Linkage Model is an asymptotic level-$α$-test. Finally, we demonstrate the practical relevance of our results by applying the test to real-world data from the 1000 Genomes Project.
