Table of Contents
Fetching ...

Longitudinal prediction of DNA methylation to forecast epigenetic outcomes

Arthur Leroy, Ai Ling Teh, Frank Dondelinger, Mauricio A. Alvarez, Dennis Wang

TL;DR

This study addresses the challenge of forecasting DNA methylation trajectories in early life by introducing a probabilistic, longitudinal framework based on multi-mean Gaussian processes that share information across individuals and CpGs. The model decomposes each time series as a sum of a common mean $μ_0(t)$, an individual-specific component $f_i(t)$, and a CpG-specific component $g^j(t)$, with noise $ε_i^j(t)$, enabling posterior predictive distributions $y_i^j(t)$ with quantified uncertainty. Predictions of methylation at 72 months from earlier time points show high accuracy (≈$r$ ≈ 0.99, Spearman ≈ 0.98) and well-calibrated uncertainty, and epigenetic ages estimated from predicted methylation via Horvath skin&blood and PedBE clocks align closely with true ages, albeit with clock-specific biases. Age acceleration derived from predicted methylation is associated with health measures such as MVPA and diastolic blood pressure, illustrating practical links between forecasted epigenetic states and developmental health. Overall, the approach enables imputation of missing methylation data, supports longitudinal epigenetic analyses of development and aging, and scales to genome-wide CpGs with calibrated uncertainty.

Abstract

Interrogating the evolution of biological changes at early stages of life requires longitudinal profiling of molecules, such as DNA methylation, which can be challenging with children. We introduce a probabilistic and longitudinal machine learning framework based on multi-mean Gaussian processes (GPs), accounting for individual and gene correlations across time. This method provides future predictions of DNA methylation status at different individual ages while accounting for uncertainty. Our model is trained on a birth cohort of children with methylation profiled at ages 0-4, and we demonstrated that the status of methylation sites for each child can be accurately predicted at ages 5-7. We show that methylation profiles predicted by multi-mean GPs can be used to estimate other phenotypes, such as epigenetic age, and enable comparison to other health measures of interest. This approach encourages epigenetic studies to move towards longitudinal design for investigating epigenetic changes during development, ageing and disease progression.

Longitudinal prediction of DNA methylation to forecast epigenetic outcomes

TL;DR

This study addresses the challenge of forecasting DNA methylation trajectories in early life by introducing a probabilistic, longitudinal framework based on multi-mean Gaussian processes that share information across individuals and CpGs. The model decomposes each time series as a sum of a common mean , an individual-specific component , and a CpG-specific component , with noise , enabling posterior predictive distributions with quantified uncertainty. Predictions of methylation at 72 months from earlier time points show high accuracy (≈ ≈ 0.99, Spearman ≈ 0.98) and well-calibrated uncertainty, and epigenetic ages estimated from predicted methylation via Horvath skin&blood and PedBE clocks align closely with true ages, albeit with clock-specific biases. Age acceleration derived from predicted methylation is associated with health measures such as MVPA and diastolic blood pressure, illustrating practical links between forecasted epigenetic states and developmental health. Overall, the approach enables imputation of missing methylation data, supports longitudinal epigenetic analyses of development and aging, and scales to genome-wide CpGs with calibrated uncertainty.

Abstract

Interrogating the evolution of biological changes at early stages of life requires longitudinal profiling of molecules, such as DNA methylation, which can be challenging with children. We introduce a probabilistic and longitudinal machine learning framework based on multi-mean Gaussian processes (GPs), accounting for individual and gene correlations across time. This method provides future predictions of DNA methylation status at different individual ages while accounting for uncertainty. Our model is trained on a birth cohort of children with methylation profiled at ages 0-4, and we demonstrated that the status of methylation sites for each child can be accurately predicted at ages 5-7. We show that methylation profiles predicted by multi-mean GPs can be used to estimate other phenotypes, such as epigenetic age, and enable comparison to other health measures of interest. This approach encourages epigenetic studies to move towards longitudinal design for investigating epigenetic changes during development, ageing and disease progression.
Paper Structure (18 sections, 4 equations, 12 figures, 3 tables)

This paper contains 18 sections, 4 equations, 12 figures, 3 tables.

Figures (12)

  • Figure 1: Schematic overview of predicting methylation profiles and its application. Methylation values were not collected when there were clinical measures. The missing methylation values at CpG sites needed for computing epigenetic age can be predicted using longitudinal data collected at early time points using multi-mean Gaussian processes. Epigenetic age computed from predicted methylation values is then used to perform the association study with clinical measure (i.e., diastolic blood pressure).
  • Figure 2: CpG-specific mean processes and individual-specific predictions. CpG-specific mean processes (dashed line) with associated 95% credible interval (pink band) differs between two illustrative CpGs: cg00609333 (A) and cg06430061 (B). Multi-mean GPs prediction curve (pink) with associated 95% credible intervals (pink band) for two illustrative individuals (C & D). The dashed line represents the mean curve from the CpG-specific mean process. Observed points are coloured in black, while the predicted point is in red. Background points correspond to the training observations coloured by individuals.
  • Figure 3: Examples of predicted methylation values versus measured methylation values for the 91 CpGs in PedBE clock using 6 individuals from the testing set as examples. The predicted methylation value is plotted on the y-axis, and the observed methylation is plotted on the x-axis. Each dot represents a CpG in the PedBE epigenetic clock signature. The red dotted line represents the x=y line.
  • Figure 4: Epigenetic age computed from predicted methylation values versus observed methylation values and the variance of predictions for 188 testing samples. Epigenetic ages computed from predicted methylation values are plotted against epigenetic age computed from observed methylation values for PedBE clock (A) and skin&blood clock (B). The variance of the epigenetic age prediction (uncertainty quantification of the epigenetic age predictions) is plotted against errors (difference in epigenetic age from using predicted methylation and observed methylation value) for 188 individuals for PedBE (C) and skin&blood (D) respectively. For each individual (sorted by increasing uncertainty on the y-axis), the predicted mean ages using PedBE and skin&blood clocks on predicted methylation values at 6 years are used as a reference and displayed as a purple line; the pink region corresponds to the associated 95% credible intervals; each red dot corresponds to the epigenetic age computed using true observed methylation values.
  • Figure 5: Scatter plot of age acceleration(AA) plotted against moderate to vigorous physical activity (MVPA) measured at age 5.5 years. Each dot represents a subject in the testing set. The red dotted line is the regression line.
  • ...and 7 more figures