Table of Contents
Fetching ...

A Statistical Test for Comparing the Linkage and Admixture Model Based on Central Limit Theorems

Carola Sophia Heinzel

TL;DR

The work addresses how to decide between the Admixture model and the Linkage model for ancestry data by establishing consistency and central limit theorems for maximum likelihood estimators in the Linkage model, enabling a principled asymptotic likelihood-ratio test. It proves identifiability and asymptotic normality, derives the test statistic $\Lambda$ with a $\chi^2_1$ null distribution, and applies the method to simulated data and 1000 Genomes AIMs to illustrate per-individual model selection and marker-set assessment. The results provide a rigorous statistical foundation for model selection in ancestry inference under time-inhomogeneous HMMs and demonstrate practical implications for population genetics analyses. The approach enables quantified uncertainty of ancestry estimates and supports targeted data-driven decisions on marker sets and recombination-rate assumptions in real genomic datasets.

Abstract

In the Admixture Model, the probability that an individual carries a certain allele at a specific marker depends on the allele frequencies in $K$ ancestral populations and the proportion of the individual's genome originating from these populations. The markers are assumed to be independent. The Linkage Model is a Hidden Markov Model (HMM) that extends the Admixture Model by incorporating linkage between neighboring loci. We prove consistency and asymptotic normality of maximum likelihood estimators (MLEs) for the ancestry of individuals in the Linkage Model, complementing earlier results by \citep{pfaff2004information, pfaffelhuber2022central, HEINZEL2025} for the Admixture Model. These results are used to prove that a statistical test that allows for model selection between the Admixture Model and the Linkage Model is an asymptotic level-$α$-test. Finally, we demonstrate the practical relevance of our results by applying the test to real-world data from the 1000 Genomes Project.

A Statistical Test for Comparing the Linkage and Admixture Model Based on Central Limit Theorems

TL;DR

The work addresses how to decide between the Admixture model and the Linkage model for ancestry data by establishing consistency and central limit theorems for maximum likelihood estimators in the Linkage model, enabling a principled asymptotic likelihood-ratio test. It proves identifiability and asymptotic normality, derives the test statistic with a null distribution, and applies the method to simulated data and 1000 Genomes AIMs to illustrate per-individual model selection and marker-set assessment. The results provide a rigorous statistical foundation for model selection in ancestry inference under time-inhomogeneous HMMs and demonstrate practical implications for population genetics analyses. The approach enables quantified uncertainty of ancestry estimates and supports targeted data-driven decisions on marker sets and recombination-rate assumptions in real genomic datasets.

Abstract

In the Admixture Model, the probability that an individual carries a certain allele at a specific marker depends on the allele frequencies in ancestral populations and the proportion of the individual's genome originating from these populations. The markers are assumed to be independent. The Linkage Model is a Hidden Markov Model (HMM) that extends the Admixture Model by incorporating linkage between neighboring loci. We prove consistency and asymptotic normality of maximum likelihood estimators (MLEs) for the ancestry of individuals in the Linkage Model, complementing earlier results by \citep{pfaff2004information, pfaffelhuber2022central, HEINZEL2025} for the Admixture Model. These results are used to prove that a statistical test that allows for model selection between the Admixture Model and the Linkage Model is an asymptotic level--test. Finally, we demonstrate the practical relevance of our results by applying the test to real-world data from the 1000 Genomes Project.

Paper Structure

This paper contains 11 sections, 13 theorems, 48 equations, 6 figures.

Key Result

Theorem 1

Let Assumption ass:consistency:inh hold. Then, for the MLE it holds for any $\epsilon > 0.$

Figures (6)

  • Figure 1: Overview of results in this section and their requirements.
  • Figure 2: Evaluation of the statistical test by using simulated data for different values of $r$ and $d$.
  • Figure 3: Genetic Distances of the markers in the AIM set by kidd2014 rounded to integer values.
  • Figure 4: Results of the statistical test from Definition \ref{['def:test']} for the data from 10002015global.
  • Figure 5: Covariance Matrix for the MLE in the Admixture Model. We considered individual HG00096. The MLE for $q$ was $(1, 0, 0, 0, 0).$
  • ...and 1 more figures

Theorems & Definitions (30)

  • Definition 2.1: Linkage Model for Haploid Individuals
  • Remark 2.2: Diploid Case
  • Definition 2.3: Admixture Model
  • Definition 2.4: Statistical Test
  • Remark 3.2: Assumptions
  • Theorem 1: Consistency of the MLE
  • Theorem 2: Central Limit Theorem for the MLE
  • Remark 3.3
  • Theorem 3: Asymptotic distribution of the test statistic
  • Remark 4.1: Test for a whole population
  • ...and 20 more