Table of Contents
Fetching ...

A Novel Approach to Linking Histology Images with DNA Methylation

Manahil Raza, Muhammad Dawood, Talha Qaiser, Nasir M. Rajpoot

TL;DR

This work introduces SlideGraph^methyl, a graph neural network framework that weakly learns to predict gene-group differential DNA methylation states from whole-slide histology images. By constructing WSI graphs from patch features and employing a pairwise ranking objective, the method achieves superior AUROC and AP performance compared with state-of-the-art baselines across TCGA glioma and renal carcinoma cohorts, and it reveals biologically meaningful enrichment via GSEA and spatially resolved heatmaps. The approach demonstrates that spatial histopathology patterns can serve as digital biomarkers for epigenetic states, potentially enabling faster, image-based cancer stratification alongside traditional methylation assays. The study also provides insights into tumor biology by linking visual patterns to methylation-driven pathways, with plans to extend to multi-modal data and additional cancer types.

Abstract

DNA methylation is an epigenetic mechanism that regulates gene expression by adding methyl groups to DNA. Abnormal methylation patterns can disrupt gene expression and have been linked to cancer development. To quantify DNA methylation, specialized assays are typically used. However, these assays are often costly and have lengthy processing times, which limits their widespread availability in routine clinical practice. In contrast, whole slide images (WSIs) for the majority of cancer patients can be more readily available. As such, given the ready availability of WSIs, there is a compelling need to explore the potential relationship between WSIs and DNA methylation patterns. To address this, we propose an end-to-end graph neural network based weakly supervised learning framework to predict the methylation state of gene groups exhibiting coherent patterns across samples. Using data from three cohorts from The Cancer Genome Atlas (TCGA) - TCGA-LGG (Brain Lower Grade Glioma), TCGA-GBM (Glioblastoma Multiforme) ($n$=729) and TCGA-KIRC (Kidney Renal Clear Cell Carcinoma) ($n$=511) - we demonstrate that the proposed approach achieves significantly higher AUROC scores than the state-of-the-art (SOTA) methods, by more than $20\%$. We conduct gene set enrichment analyses on the gene groups and show that majority of the gene groups are significantly enriched in important hallmarks and pathways. We also generate spatially enriched heatmaps to further investigate links between histological patterns and DNA methylation states. To the best of our knowledge, this is the first study that explores association of spatially resolved histological patterns with gene group methylation states across multiple cancer types using weakly supervised deep learning.

A Novel Approach to Linking Histology Images with DNA Methylation

TL;DR

This work introduces SlideGraph^methyl, a graph neural network framework that weakly learns to predict gene-group differential DNA methylation states from whole-slide histology images. By constructing WSI graphs from patch features and employing a pairwise ranking objective, the method achieves superior AUROC and AP performance compared with state-of-the-art baselines across TCGA glioma and renal carcinoma cohorts, and it reveals biologically meaningful enrichment via GSEA and spatially resolved heatmaps. The approach demonstrates that spatial histopathology patterns can serve as digital biomarkers for epigenetic states, potentially enabling faster, image-based cancer stratification alongside traditional methylation assays. The study also provides insights into tumor biology by linking visual patterns to methylation-driven pathways, with plans to extend to multi-modal data and additional cancer types.

Abstract

DNA methylation is an epigenetic mechanism that regulates gene expression by adding methyl groups to DNA. Abnormal methylation patterns can disrupt gene expression and have been linked to cancer development. To quantify DNA methylation, specialized assays are typically used. However, these assays are often costly and have lengthy processing times, which limits their widespread availability in routine clinical practice. In contrast, whole slide images (WSIs) for the majority of cancer patients can be more readily available. As such, given the ready availability of WSIs, there is a compelling need to explore the potential relationship between WSIs and DNA methylation patterns. To address this, we propose an end-to-end graph neural network based weakly supervised learning framework to predict the methylation state of gene groups exhibiting coherent patterns across samples. Using data from three cohorts from The Cancer Genome Atlas (TCGA) - TCGA-LGG (Brain Lower Grade Glioma), TCGA-GBM (Glioblastoma Multiforme) (=729) and TCGA-KIRC (Kidney Renal Clear Cell Carcinoma) (=511) - we demonstrate that the proposed approach achieves significantly higher AUROC scores than the state-of-the-art (SOTA) methods, by more than . We conduct gene set enrichment analyses on the gene groups and show that majority of the gene groups are significantly enriched in important hallmarks and pathways. We also generate spatially enriched heatmaps to further investigate links between histological patterns and DNA methylation states. To the best of our knowledge, this is the first study that explores association of spatially resolved histological patterns with gene group methylation states across multiple cancer types using weakly supervised deep learning.

Paper Structure

This paper contains 10 sections, 4 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: The proposed pipeline of SlideGraph$^{methyl}$ for the prediction of the gene group-level methylation status from WSIs. We extract feature representations for the WSI patches to construct a WSI-level graph. This is then fed into a graph neural network to predict the methylation state for a gene group. The hierarchically-clustered heatmaps illustrate the differential methylation (DM) values obtained using MethylMix which serve as the ground truth for this classification problem.
  • Figure 2: Results of hierarchical clustering of the differential methylation (DM) values values for TCGA-GBMLGG (a) and TCGA-KIRC (b). The dendrograms illustrate the clustering of genes (x-axis) based on the DM values for patient cohorts (y-axis). The word-clouds represent the genes in each gene group where red, blue and pink colors indicate hyper-methylated, hypo-methylated and normally-methylated genes, based on their median values across the patient cohorts TCGA-GBMLGG (c) and TCGA-KIRC (d). The font sizes are not representative of anything in particular.
  • Figure 3: Example WSIs for TCGA-GBMLGG gene group 0 for status = 0 (top row) and status = 1 (bottom row) and the corresponding heatmaps. Additionally, we show magnified highly contributing ROIs identified by the proposed method for status = 0 (blue) and status = 1 (red).
  • Figure 4: Example WSIs for TCGA-KIRC gene group 0 for status = 0, (top row) and status = 1, (bottom row) and the corresponding heatmaps. Additionally, we show magnified highly contributing ROIs identified by the proposed method for status = 0 (blue) and status = 1 (red).
  • Figure 5: a) Boxplots showing AUROC distribution of SlideGraph$^\infty$ and SlideGraph$^{methyl}$ for the three gene groups across 1,000 bootstrap runs for TCGA-GBMLGG b) Boxplots showing AUROC distribution of SlideGraph$^\infty$ and SlideGraph$^{methyl}$ for the two gene groups across 1,000 bootstrap runs for TCGA-KIRC.
  • ...and 1 more figures