iTARGET: Interpretable Tailored Age Regression for Grouped Epigenetic Traits
Zipeng Wu, Daniel Herring, Fabian Spill, James Andrews
TL;DR
This work tackles the challenge of predicting chronological age from DNA methylation in the presence of Epigenetic Correlation Drift $ECD$ and Heterogeneity Among CpGs $HAC$. It introduces iTARGET, a two-phase approach that first clusters samples into age groups using FAISS and then trains age-group-specific predictors with 30 CpG features via Explainable Boosting Machines $EBM$, capturing both main effects and CpG interactions. The method yields superior accuracy compared to pretrained epigenetic clocks and standard ML baselines, with a decadel-based grouping achieving a strong MAE of $3.7752$ and a biologically informed four-segment grouping approaching similar performance, while also providing interpretable biomarker insights and decade-specific aging dynamics. The approach offers practical benefits for aging research by delivering precise, interpretable estimations of biological age and identifying key CpG sites and interactions across life stages.
Abstract
Accurately predicting chronological age from DNA methylation patterns is crucial for advancing biological age estimation. However, this task is made challenging by Epigenetic Correlation Drift (ECD) and Heterogeneity Among CpGs (HAC), which reflect the dynamic relationship between methylation and age across different life stages. To address these issues, we propose a novel two-phase algorithm. The first phase employs similarity searching to cluster methylation profiles by age group, while the second phase uses Explainable Boosting Machines (EBM) for precise, group-specific prediction. Our method not only improves prediction accuracy but also reveals key age-related CpG sites, detects age-specific changes in aging rates, and identifies pairwise interactions between CpG sites. Experimental results show that our approach outperforms traditional epigenetic clocks and machine learning models, offering a more accurate and interpretable solution for biological age estimation with significant implications for aging research.
