Table of Contents
Fetching ...

Non-Linear Drivers of Population Dynamics: a Nonparametric Coalescent Approach

Filippo Monti, Nuno R. Faria, Xiang Ji, Philippe Lemey, Moritz U. G. Kraemer, Marc A. Suchard

TL;DR

This work tackles the challenge of reconstructing time-varying effective population sizes $N_e(t)$ from genetic data while leveraging external covariates. It replaces restrictive log-linear covariate links with a Gaussian process (GP) prior over covariate effects and couples this with a Gaussian Markov random field (GMRF) to enforce temporal smoothness on the log (or inverse) $N_e(t)$ within a fixed grid of change points, all inferred via Hamiltonian Monte Carlo. The approach supports piecewise-constant representations, scales to multilocus data, and provides nonlinear covariate effects with spatially varying uncertainty. In applications to yellow fever virus dynamics in Brazil, ancient musk ox demography, and HIV-1 CRF02_AG in Cameroon, the method recovers nonlinear covariate relationships missed by linear models and yields richer, more interpretable uncertainty, advancing phylodynamic inference and its use in understanding environmental and epidemiological drivers of population size over time.

Abstract

Effective population size (Ne(t)) is a fundamental parameter in population genetics and phylodynamics that quantifies genetic diversity and reveals demographic history. Coalescent-based methods enable the inference of Ne(t) trajectories through time from phylogenies reconstructed from molecular sequence data. Understanding the ecological and environmental drivers of population dynamics requires linking Ne(t) to external covariates. Existing approaches typically impose log-linear relationships between covariates and Ne(t), which may fail to capture complex biological processes and can introduce bias when the true relationship is nonlinear. We present a flexible Bayesian framework that integrates covariates into coalescent models with piecewise-constant Ne(t) through a Gaussian process (GP) prior. The GP, a distribution over functions, naturally accommodates nonlinear covariate effects without restrictive parametric assumptions. This formulation improves estimation of covariate-Ne(t) relationships, mitigates bias under nonlinear associations, and yields interpretable uncertainty quantification that varies across the covariate space. To balance global covariate-driven patterns with local temporal dynamics, we couple the GP prior with a Gaussian Markov random field that enforces smoothness in Ne(t) trajectories. Through simulation studies and three empirical applications - yellow fever virus dynamics in Brazil (2016-2018), late-Quaternary musk ox demography, and HIV-1 CRF02-AG evolution in Cameroon - we demonstrate that our method both confirms linear relationships where appropriate and reveals nonlinear covariate effects that would otherwise be missed or mischaracterized. This framework advances phylodynamic inference by enabling more accurate and biologically realistic modeling of how environmental and epidemiological factors shape population size through time.

Non-Linear Drivers of Population Dynamics: a Nonparametric Coalescent Approach

TL;DR

This work tackles the challenge of reconstructing time-varying effective population sizes from genetic data while leveraging external covariates. It replaces restrictive log-linear covariate links with a Gaussian process (GP) prior over covariate effects and couples this with a Gaussian Markov random field (GMRF) to enforce temporal smoothness on the log (or inverse) within a fixed grid of change points, all inferred via Hamiltonian Monte Carlo. The approach supports piecewise-constant representations, scales to multilocus data, and provides nonlinear covariate effects with spatially varying uncertainty. In applications to yellow fever virus dynamics in Brazil, ancient musk ox demography, and HIV-1 CRF02_AG in Cameroon, the method recovers nonlinear covariate relationships missed by linear models and yields richer, more interpretable uncertainty, advancing phylodynamic inference and its use in understanding environmental and epidemiological drivers of population size over time.

Abstract

Effective population size (Ne(t)) is a fundamental parameter in population genetics and phylodynamics that quantifies genetic diversity and reveals demographic history. Coalescent-based methods enable the inference of Ne(t) trajectories through time from phylogenies reconstructed from molecular sequence data. Understanding the ecological and environmental drivers of population dynamics requires linking Ne(t) to external covariates. Existing approaches typically impose log-linear relationships between covariates and Ne(t), which may fail to capture complex biological processes and can introduce bias when the true relationship is nonlinear. We present a flexible Bayesian framework that integrates covariates into coalescent models with piecewise-constant Ne(t) through a Gaussian process (GP) prior. The GP, a distribution over functions, naturally accommodates nonlinear covariate effects without restrictive parametric assumptions. This formulation improves estimation of covariate-Ne(t) relationships, mitigates bias under nonlinear associations, and yields interpretable uncertainty quantification that varies across the covariate space. To balance global covariate-driven patterns with local temporal dynamics, we couple the GP prior with a Gaussian Markov random field that enforces smoothness in Ne(t) trajectories. Through simulation studies and three empirical applications - yellow fever virus dynamics in Brazil (2016-2018), late-Quaternary musk ox demography, and HIV-1 CRF02-AG evolution in Cameroon - we demonstrate that our method both confirms linear relationships where appropriate and reveals nonlinear covariate effects that would otherwise be missed or mischaracterized. This framework advances phylodynamic inference by enabling more accurate and biologically realistic modeling of how environmental and epidemiological factors shape population size through time.
Paper Structure (47 sections, 25 equations, 5 figures)

This paper contains 47 sections, 25 equations, 5 figures.

Figures (5)

  • Figure 1: Simulations: $\operatorname{log}\coEPS{\coalescentTime{}}$ vs Covariates
  • Figure 2: Data Examples: $\operatorname{log}\coEPS{\coalescentTime{}}$ vs Covariates
  • Figure 3: Yellow Fever Virus: Tree and demography
  • Figure 4: Ancient Musk Ox: Tree and Demography
  • Figure 5: HIV Example: Demography