Table of Contents
Fetching ...

Causal Inference, Biomarker Discovery, Graph Neural Network, Feature Selection

Chaowang Lan, Jingxin Wu, Yulong Yuan, Chuxun Liu, Huangyi Kang, Caihua Liu

TL;DR

This work tackles biomarker discovery from transcriptomic data by introducing a causal graph neural network (Causal-GNN) that combines causal inference with multi-layer GNNs to estimate gene-specific causal effects. By constructing a gene regulatory network, computing propensity scores with a three-layer GCN, and estimating average causal effects, the method identifies stable, biologically meaningful biomarkers. Across four heterogeneous datasets and four classifiers, it achieves high predictive accuracy while yielding compact biomarker panels, and stability analyses show robust reproducibility across resampling. GO enrichment of top GBM biomarkers further supports their biological relevance, highlighting the framework's potential for broad precision medicine applications.

Abstract

Biomarker discovery from high-throughput transcriptomic data is crucial for advancing precision medicine. However, existing methods often neglect gene-gene regulatory relationships and lack stability across datasets, leading to conflation of spurious correlations with genuine causal effects. To address these issues, we develop a causal graph neural network (Causal-GNN) method that integrates causal inference with multi-layer graph neural networks (GNNs). The key innovation is the incorporation of causal effect estimation for identifying stable biomarkers, coupled with a GNN-based propensity scoring mechanism that leverages cross-gene regulatory networks. Experimental results demonstrate that our method achieves consistently high predictive accuracy across four distinct datasets and four independent classifiers. Moreover, it enables the identification of more stable biomarkers compared to traditional methods. Our work provides a robust, efficient, and biologically interpretable tool for biomarker discovery, demonstrating strong potential for broad application across medical disciplines.

Causal Inference, Biomarker Discovery, Graph Neural Network, Feature Selection

TL;DR

This work tackles biomarker discovery from transcriptomic data by introducing a causal graph neural network (Causal-GNN) that combines causal inference with multi-layer GNNs to estimate gene-specific causal effects. By constructing a gene regulatory network, computing propensity scores with a three-layer GCN, and estimating average causal effects, the method identifies stable, biologically meaningful biomarkers. Across four heterogeneous datasets and four classifiers, it achieves high predictive accuracy while yielding compact biomarker panels, and stability analyses show robust reproducibility across resampling. GO enrichment of top GBM biomarkers further supports their biological relevance, highlighting the framework's potential for broad precision medicine applications.

Abstract

Biomarker discovery from high-throughput transcriptomic data is crucial for advancing precision medicine. However, existing methods often neglect gene-gene regulatory relationships and lack stability across datasets, leading to conflation of spurious correlations with genuine causal effects. To address these issues, we develop a causal graph neural network (Causal-GNN) method that integrates causal inference with multi-layer graph neural networks (GNNs). The key innovation is the incorporation of causal effect estimation for identifying stable biomarkers, coupled with a GNN-based propensity scoring mechanism that leverages cross-gene regulatory networks. Experimental results demonstrate that our method achieves consistently high predictive accuracy across four distinct datasets and four independent classifiers. Moreover, it enables the identification of more stable biomarkers compared to traditional methods. Our work provides a robust, efficient, and biologically interpretable tool for biomarker discovery, demonstrating strong potential for broad application across medical disciplines.

Paper Structure

This paper contains 13 sections, 9 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: The Framework of Our Methodology. Step1: constructing the gene regulatory network; Step2: calculating propensity score via graph neural network; Step3: calculating the average causal effects of each gene.
  • Figure 2: Training loss trajectories of three different GCN architectures with varying network depths (2-layer, 3-layer, and 4-layer) over 200 epochs.
  • Figure 3: F1-score, accuracy, and number of selected features of six feature-selection methods on four datasets across four classifiers