Table of Contents
Fetching ...

RGE-GCN: Recursive Gene Elimination with Graph Convolutional Networks for RNA-seq based Early Cancer Detection

Shreyas Shende, Varsha Narayanan, Vishal Fenn, Yiran Huang, Dincer Goksuluk, Gaurav Choudhary, Melih Agraz, Mengjia Xu

TL;DR

The paper introduces RGE-GCN, a novel end-to-end framework that jointly performs recursive gene elimination and cancer/normal classification directly from bulk RNA-seq data. By constructing a sample-sample graph, training a Graph Convolutional Network, and using Integrated Gradients to guide feature pruning, the method yields compact, interpretable gene signatures. Across synthetic and real cancer datasets (lung, cervical, kidney), RGE-GCN achieves high accuracy and F1-scores, often surpassing traditional DEG-based pipelines while revealing biologically meaningful pathways such as PI3K-AKT, MAPK, SUMOylation, and immune regulation. The work highlights the framework's potential for generalizable RNA-seq biomarker discovery and early cancer detection, while noting computational cost and avenues for multi-omics integration and transfer learning.

Abstract

Early detection of cancer plays a key role in improving survival rates, but identifying reliable biomarkers from RNA-seq data is still a major challenge. The data are high-dimensional, and conventional statistical methods often fail to capture the complex relationships between genes. In this study, we introduce RGE-GCN (Recursive Gene Elimination with Graph Convolutional Networks), a framework that combines feature selection and classification in a single pipeline. Our approach builds a graph from gene expression profiles, uses a Graph Convolutional Network to classify cancer versus normal samples, and applies Integrated Gradients to highlight the most informative genes. By recursively removing less relevant genes, the model converges to a compact set of biomarkers that are both interpretable and predictive. We evaluated RGE-GCN on synthetic data as well as real-world RNA-seq cohorts of lung, kidney, and cervical cancers. Across all datasets, the method consistently achieved higher accuracy and F1-scores than standard tools such as DESeq2, edgeR, and limma-voom. Importantly, the selected genes aligned with well-known cancer pathways including PI3K-AKT, MAPK, SUMOylation, and immune regulation. These results suggest that RGE-GCN shows promise as a generalizable approach for RNA-seq based early cancer detection and biomarker discovery (https://rce-gcn.streamlit.app/ ).

RGE-GCN: Recursive Gene Elimination with Graph Convolutional Networks for RNA-seq based Early Cancer Detection

TL;DR

The paper introduces RGE-GCN, a novel end-to-end framework that jointly performs recursive gene elimination and cancer/normal classification directly from bulk RNA-seq data. By constructing a sample-sample graph, training a Graph Convolutional Network, and using Integrated Gradients to guide feature pruning, the method yields compact, interpretable gene signatures. Across synthetic and real cancer datasets (lung, cervical, kidney), RGE-GCN achieves high accuracy and F1-scores, often surpassing traditional DEG-based pipelines while revealing biologically meaningful pathways such as PI3K-AKT, MAPK, SUMOylation, and immune regulation. The work highlights the framework's potential for generalizable RNA-seq biomarker discovery and early cancer detection, while noting computational cost and avenues for multi-omics integration and transfer learning.

Abstract

Early detection of cancer plays a key role in improving survival rates, but identifying reliable biomarkers from RNA-seq data is still a major challenge. The data are high-dimensional, and conventional statistical methods often fail to capture the complex relationships between genes. In this study, we introduce RGE-GCN (Recursive Gene Elimination with Graph Convolutional Networks), a framework that combines feature selection and classification in a single pipeline. Our approach builds a graph from gene expression profiles, uses a Graph Convolutional Network to classify cancer versus normal samples, and applies Integrated Gradients to highlight the most informative genes. By recursively removing less relevant genes, the model converges to a compact set of biomarkers that are both interpretable and predictive. We evaluated RGE-GCN on synthetic data as well as real-world RNA-seq cohorts of lung, kidney, and cervical cancers. Across all datasets, the method consistently achieved higher accuracy and F1-scores than standard tools such as DESeq2, edgeR, and limma-voom. Importantly, the selected genes aligned with well-known cancer pathways including PI3K-AKT, MAPK, SUMOylation, and immune regulation. These results suggest that RGE-GCN shows promise as a generalizable approach for RNA-seq based early cancer detection and biomarker discovery (https://rce-gcn.streamlit.app/ ).

Paper Structure

This paper contains 26 sections, 5 equations, 2 figures, 6 tables.

Figures (2)

  • Figure 1: Architecture of the proposed RGE-GCN framework for biomarker discovery. The process begins with an outer split of the gene expression data into training and held-out test sets. The main RGE loop operates on the training data, starting with an inner train/validation split. In each iteration, a sample-sample graph is constructed using the Pearson Correlation Coefficient (PCC). A three-layer Graph Convolutional Network (GCN) Kipf2016SemiSupervisedCW is trained on this graph and used to generate class logits. Gene importance scores are then derived from these logits using Integrated Gradients (IG) sundararajan2017axiomatic. Based on these scores, the least informative genes are eliminated. The gene set that maximizes validation accuracy is selected as optimal and is finally evaluated on the held-out test set.
  • Figure 2: Top 30 genes identified by our RGE-GCN model, ranked according to their normalized Integrated Gradients ($|IG|$) importance scores, for the Kidney, Cervical, and Lung disease datasets.