RGE-GCN: Recursive Gene Elimination with Graph Convolutional Networks for RNA-seq based Early Cancer Detection
Shreyas Shende, Varsha Narayanan, Vishal Fenn, Yiran Huang, Dincer Goksuluk, Gaurav Choudhary, Melih Agraz, Mengjia Xu
TL;DR
The paper introduces RGE-GCN, a novel end-to-end framework that jointly performs recursive gene elimination and cancer/normal classification directly from bulk RNA-seq data. By constructing a sample-sample graph, training a Graph Convolutional Network, and using Integrated Gradients to guide feature pruning, the method yields compact, interpretable gene signatures. Across synthetic and real cancer datasets (lung, cervical, kidney), RGE-GCN achieves high accuracy and F1-scores, often surpassing traditional DEG-based pipelines while revealing biologically meaningful pathways such as PI3K-AKT, MAPK, SUMOylation, and immune regulation. The work highlights the framework's potential for generalizable RNA-seq biomarker discovery and early cancer detection, while noting computational cost and avenues for multi-omics integration and transfer learning.
Abstract
Early detection of cancer plays a key role in improving survival rates, but identifying reliable biomarkers from RNA-seq data is still a major challenge. The data are high-dimensional, and conventional statistical methods often fail to capture the complex relationships between genes. In this study, we introduce RGE-GCN (Recursive Gene Elimination with Graph Convolutional Networks), a framework that combines feature selection and classification in a single pipeline. Our approach builds a graph from gene expression profiles, uses a Graph Convolutional Network to classify cancer versus normal samples, and applies Integrated Gradients to highlight the most informative genes. By recursively removing less relevant genes, the model converges to a compact set of biomarkers that are both interpretable and predictive. We evaluated RGE-GCN on synthetic data as well as real-world RNA-seq cohorts of lung, kidney, and cervical cancers. Across all datasets, the method consistently achieved higher accuracy and F1-scores than standard tools such as DESeq2, edgeR, and limma-voom. Importantly, the selected genes aligned with well-known cancer pathways including PI3K-AKT, MAPK, SUMOylation, and immune regulation. These results suggest that RGE-GCN shows promise as a generalizable approach for RNA-seq based early cancer detection and biomarker discovery (https://rce-gcn.streamlit.app/ ).
