Artificial Intelligence and Deep Learning Algorithms for Epigenetic Sequence Analysis: A Review for Epigeneticists and AI Experts
Muhammad Tahir, Mahboobeh Norouzi, Shehroz S. Khan, James R. Davie, Soichiro Yamanaka, Ahmed Ashraf
TL;DR
The paper surveys the intersection of artificial intelligence and deep learning with epigenetic sequence analysis, presenting a dual-perspective taxonomy that helps AI researchers identify tractable epigenetic problems and helps epigeneticists map those problems to suitable AI paradigms. It covers data types and public resources, reviews core DL architectures (CNNs, RNNs/LSTMs, autoencoders, transformers) and their applicability to epigenetic tasks, and systematically organizes literature by problem category: disease-marker prediction, gene expression, enhancer–promoter interactions, chromatin state discovery, and representation learning. The review highlights representative methods (e.g., DISMIR, DeepHistone, DeepChrome, SPEID, ChromeGCN, DeepC, ChromTransfer) and reports high performance in several domains, while also identifying pervasive challenges such as data imbalance and cross-dataset generalization. It offers concrete recommendations on data augmentation, contrastive learning, transfer learning, model interpretability, and wet-lab validation to advance robust, generalizable epigenetic AI solutions with potential clinical impact.
Abstract
Epigenetics encompasses mechanisms that can alter the expression of genes without changing the underlying genetic sequence. The epigenetic regulation of gene expression is initiated and sustained by several mechanisms such as DNA methylation, histone modifications, chromatin conformation, and non-coding RNA. The changes in gene regulation and expression can manifest in the form of various diseases and disorders such as cancer and congenital deformities. Over the last few decades, high throughput experimental approaches have been used to identify and understand epigenetic changes, but these laboratory experimental approaches and biochemical processes are time-consuming and expensive. To overcome these challenges, machine learning and artificial intelligence (AI) approaches have been extensively used for mapping epigenetic modifications to their phenotypic manifestations. In this paper we provide a narrative review of published research on AI models trained on epigenomic data to address a variety of problems such as prediction of disease markers, gene expression, enhancer promoter interaction, and chromatin states. The purpose of this review is twofold as it is addressed to both AI experts and epigeneticists. For AI researchers, we provided a taxonomy of epigenetics research problems that can benefit from an AI-based approach. For epigeneticists, given each of the above problems we provide a list of candidate AI solutions in the literature. We have also identified several gaps in the literature, research challenges, and recommendations to address these challenges.
