Table of Contents
Fetching ...

iTARGET: Interpretable Tailored Age Regression for Grouped Epigenetic Traits

Zipeng Wu, Daniel Herring, Fabian Spill, James Andrews

TL;DR

This work tackles the challenge of predicting chronological age from DNA methylation in the presence of Epigenetic Correlation Drift $ECD$ and Heterogeneity Among CpGs $HAC$. It introduces iTARGET, a two-phase approach that first clusters samples into age groups using FAISS and then trains age-group-specific predictors with 30 CpG features via Explainable Boosting Machines $EBM$, capturing both main effects and CpG interactions. The method yields superior accuracy compared to pretrained epigenetic clocks and standard ML baselines, with a decadel-based grouping achieving a strong MAE of $3.7752$ and a biologically informed four-segment grouping approaching similar performance, while also providing interpretable biomarker insights and decade-specific aging dynamics. The approach offers practical benefits for aging research by delivering precise, interpretable estimations of biological age and identifying key CpG sites and interactions across life stages.

Abstract

Accurately predicting chronological age from DNA methylation patterns is crucial for advancing biological age estimation. However, this task is made challenging by Epigenetic Correlation Drift (ECD) and Heterogeneity Among CpGs (HAC), which reflect the dynamic relationship between methylation and age across different life stages. To address these issues, we propose a novel two-phase algorithm. The first phase employs similarity searching to cluster methylation profiles by age group, while the second phase uses Explainable Boosting Machines (EBM) for precise, group-specific prediction. Our method not only improves prediction accuracy but also reveals key age-related CpG sites, detects age-specific changes in aging rates, and identifies pairwise interactions between CpG sites. Experimental results show that our approach outperforms traditional epigenetic clocks and machine learning models, offering a more accurate and interpretable solution for biological age estimation with significant implications for aging research.

iTARGET: Interpretable Tailored Age Regression for Grouped Epigenetic Traits

TL;DR

This work tackles the challenge of predicting chronological age from DNA methylation in the presence of Epigenetic Correlation Drift and Heterogeneity Among CpGs . It introduces iTARGET, a two-phase approach that first clusters samples into age groups using FAISS and then trains age-group-specific predictors with 30 CpG features via Explainable Boosting Machines , capturing both main effects and CpG interactions. The method yields superior accuracy compared to pretrained epigenetic clocks and standard ML baselines, with a decadel-based grouping achieving a strong MAE of and a biologically informed four-segment grouping approaching similar performance, while also providing interpretable biomarker insights and decade-specific aging dynamics. The approach offers practical benefits for aging research by delivering precise, interpretable estimations of biological age and identifying key CpG sites and interactions across life stages.

Abstract

Accurately predicting chronological age from DNA methylation patterns is crucial for advancing biological age estimation. However, this task is made challenging by Epigenetic Correlation Drift (ECD) and Heterogeneity Among CpGs (HAC), which reflect the dynamic relationship between methylation and age across different life stages. To address these issues, we propose a novel two-phase algorithm. The first phase employs similarity searching to cluster methylation profiles by age group, while the second phase uses Explainable Boosting Machines (EBM) for precise, group-specific prediction. Our method not only improves prediction accuracy but also reveals key age-related CpG sites, detects age-specific changes in aging rates, and identifies pairwise interactions between CpG sites. Experimental results show that our approach outperforms traditional epigenetic clocks and machine learning models, offering a more accurate and interpretable solution for biological age estimation with significant implications for aging research.
Paper Structure (15 sections, 7 equations, 5 figures, 1 table)

This paper contains 15 sections, 7 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Overview of our proposed method.
  • Figure 2: Box plot of absolute Pearson correlation coefficients of 2,374 age-related CpG sites across different decade windows.
  • Figure 3: Global and local feature importance (contribution) for age group 20.0-30.0. The global feature importance highlights the CpG sites that have the most significant overall contribution to the age prediction model, while the local feature importance shows how these CpG sites contribute to individual predictions within this age group.
  • Figure 4: Top two CpG sites for age group 20.0-30.0.The top two CpG sites with the strongest influence on age prediction were identified which include graphs of the contribution of the two CpG sites to the prediction as a function of its value, and graphs of the distribution of the values of the two CpG sites.
  • Figure 5: Heatmaps illustrating the interaction effects between CpG sites for the age group 20.0-30.0. The contribution of interactions between the two CpG sites to the predictive model are displayed, with colors closer to yellow indicating a stronger positive contribution and colors closer to purple indicating a stronger negative contribution. This visualization highlights the nuanced effects of CpG site interactions on age prediction within this specific age group.