Longitudinal prediction of DNA methylation to forecast epigenetic outcomes
Arthur Leroy, Ai Ling Teh, Frank Dondelinger, Mauricio A. Alvarez, Dennis Wang
TL;DR
This study addresses the challenge of forecasting DNA methylation trajectories in early life by introducing a probabilistic, longitudinal framework based on multi-mean Gaussian processes that share information across individuals and CpGs. The model decomposes each time series as a sum of a common mean $μ_0(t)$, an individual-specific component $f_i(t)$, and a CpG-specific component $g^j(t)$, with noise $ε_i^j(t)$, enabling posterior predictive distributions $y_i^j(t)$ with quantified uncertainty. Predictions of methylation at 72 months from earlier time points show high accuracy (≈$r$ ≈ 0.99, Spearman ≈ 0.98) and well-calibrated uncertainty, and epigenetic ages estimated from predicted methylation via Horvath skin&blood and PedBE clocks align closely with true ages, albeit with clock-specific biases. Age acceleration derived from predicted methylation is associated with health measures such as MVPA and diastolic blood pressure, illustrating practical links between forecasted epigenetic states and developmental health. Overall, the approach enables imputation of missing methylation data, supports longitudinal epigenetic analyses of development and aging, and scales to genome-wide CpGs with calibrated uncertainty.
Abstract
Interrogating the evolution of biological changes at early stages of life requires longitudinal profiling of molecules, such as DNA methylation, which can be challenging with children. We introduce a probabilistic and longitudinal machine learning framework based on multi-mean Gaussian processes (GPs), accounting for individual and gene correlations across time. This method provides future predictions of DNA methylation status at different individual ages while accounting for uncertainty. Our model is trained on a birth cohort of children with methylation profiled at ages 0-4, and we demonstrated that the status of methylation sites for each child can be accurately predicted at ages 5-7. We show that methylation profiles predicted by multi-mean GPs can be used to estimate other phenotypes, such as epigenetic age, and enable comparison to other health measures of interest. This approach encourages epigenetic studies to move towards longitudinal design for investigating epigenetic changes during development, ageing and disease progression.
