Table of Contents
Fetching ...

GeSubNet: Gene Interaction Inference for Disease Subtype Network Generation

Ziwei Yang, Zheng Chen, Xin Liu, Rikuto Kotoge, Peng Chen, Yasuko Matsubara, Yasushi Sakurai, Jimeng Sun

TL;DR

GeSubNet addresses the challenge of deriving disease subtype-specific gene networks by learning a unified representation that combines patient gene expression with prior knowledge graphs. The framework employs three modules—Patient-M (VQ-VAE-based subtype encoding), Graph-M (Neo-GNN-based prior-network encoding), and Infer-M (integration for subtype-specific network generation)—to produce sparse, biologically meaningful subtype graphs. Experimental results across four TCGA cancer types show substantial gains over baselines in graph similarity and diversity metrics, along with GO-enrichment and a novel gene knockout analysis indicating strong biological relevance (e.g., an 83% shift likelihood for high-ranking genes in BRCA). This work demonstrates that integrating experimental data with curated gene networks can produce targeted networks that reflect subtype biology, with potential implications for biomarker discovery and precision oncology.

Abstract

Retrieving gene functional networks from knowledge databases presents a challenge due to the mismatch between disease networks and subtype-specific variations. Current solutions, including statistical and deep learning methods, often fail to effectively integrate gene interaction knowledge from databases or explicitly learn subtype-specific interactions. To address this mismatch, we propose GeSubNet, which learns a unified representation capable of predicting gene interactions while distinguishing between different disease subtypes. Graphs generated by such representations can be considered subtype-specific networks. GeSubNet is a multi-step representation learning framework with three modules: First, a deep generative model learns distinct disease subtypes from patient gene expression profiles. Second, a graph neural network captures representations of prior gene networks from knowledge databases, ensuring accurate physical gene interactions. Finally, we integrate these two representations using an inference loss that leverages graph generation capabilities, conditioned on the patient separation loss, to refine subtype-specific information in the learned representation. GeSubNet consistently outperforms traditional methods, with average improvements of 30.6%, 21.0%, 20.1%, and 56.6% across four graph evaluation metrics, averaged over four cancer datasets. Particularly, we conduct a biological simulation experiment to assess how the behavior of selected genes from over 11,000 candidates affects subtypes or patient distributions. The results show that the generated network has the potential to identify subtype-specific genes with an 83% likelihood of impacting patient distribution shifts.

GeSubNet: Gene Interaction Inference for Disease Subtype Network Generation

TL;DR

GeSubNet addresses the challenge of deriving disease subtype-specific gene networks by learning a unified representation that combines patient gene expression with prior knowledge graphs. The framework employs three modules—Patient-M (VQ-VAE-based subtype encoding), Graph-M (Neo-GNN-based prior-network encoding), and Infer-M (integration for subtype-specific network generation)—to produce sparse, biologically meaningful subtype graphs. Experimental results across four TCGA cancer types show substantial gains over baselines in graph similarity and diversity metrics, along with GO-enrichment and a novel gene knockout analysis indicating strong biological relevance (e.g., an 83% shift likelihood for high-ranking genes in BRCA). This work demonstrates that integrating experimental data with curated gene networks can produce targeted networks that reflect subtype biology, with potential implications for biomarker discovery and precision oncology.

Abstract

Retrieving gene functional networks from knowledge databases presents a challenge due to the mismatch between disease networks and subtype-specific variations. Current solutions, including statistical and deep learning methods, often fail to effectively integrate gene interaction knowledge from databases or explicitly learn subtype-specific interactions. To address this mismatch, we propose GeSubNet, which learns a unified representation capable of predicting gene interactions while distinguishing between different disease subtypes. Graphs generated by such representations can be considered subtype-specific networks. GeSubNet is a multi-step representation learning framework with three modules: First, a deep generative model learns distinct disease subtypes from patient gene expression profiles. Second, a graph neural network captures representations of prior gene networks from knowledge databases, ensuring accurate physical gene interactions. Finally, we integrate these two representations using an inference loss that leverages graph generation capabilities, conditioned on the patient separation loss, to refine subtype-specific information in the learned representation. GeSubNet consistently outperforms traditional methods, with average improvements of 30.6%, 21.0%, 20.1%, and 56.6% across four graph evaluation metrics, averaged over four cancer datasets. Particularly, we conduct a biological simulation experiment to assess how the behavior of selected genes from over 11,000 candidates affects subtypes or patient distributions. The results show that the generated network has the potential to identify subtype-specific genes with an 83% likelihood of impacting patient distribution shifts.

Paper Structure

This paper contains 37 sections, 3 equations, 8 figures, 9 tables.

Figures (8)

  • Figure 1: An example illustrating the mismatch issue in cancer gene networks. The BRCA gene network from the STRING database shows general interactions across various subtypes. Although a gene set with consistent behavior leads to the discovery of a sub-network, this sub-network cannot be directly linked to specific subtypes, such as Luminal A, Luminal B, or Basal-like.
  • Figure 2: Overview of GeSubNet. GeSubNet consists of three modules. Step 1: Patient-M sets up an unsupervised cancer subtyping task to learn the patient sample representation ($\mathbf{Z_p}$) from the input gene expression data ($\mathbf{X}$), which can distinguish subtypes. Step 2: Graph-M sets up a link prediction task to train the GNN encoder and decoder, learning the graph representation ($\mathbf{Z_g}$) from the input gene graph ($\mathcal{G}$) and expression data ($\mathbf{X}$). Step 3: Infer-M uses an objective function that integrates representations to generate subtype-specific networks. The reconstruction from Patient-M, conditioned on the GNN training in Graph-M ($q_{\theta}(\mathbf{z_g}|\mathcal{G})$), refines the graph structure, which can maintain accurate patient profile reconstruction ($p_{\phi}(\mathbf{x}|\mathbf{\tilde{x}})$).
  • Figure 3: The Venn diagrams illustrate the overlap in GO terms resulting from different methods (WGCNA, CSGNN, LR-GNN, and GeSubNet) across four cancers. Shared and unique function items are listed here. A full list is provided in Appendix \ref{['sec:App_GO']}. We highlight some unique function items that are well-supported by biological evidence in bold.
  • Figure 4: (a) UMAP visualization of an example showing patient distribution before and after the simulated gene knockout for a target subtype. The gray points in the main figure represent the negative control groups (subtypes). The small figures at the bottom left represent the original distributions of different subtypes. In the right subfigure, high-ranking genes are knocked out, while in the left, low-ranking genes are knocked out. (b) Table: shift rates ($\Delta_{\text{SR}}$) on knocking out high- and low-ranking genes, found by different baselines. The best results are highlighted in bold.
  • Figure 5: The obtained gene networks for two BRCA patient groups: the Normal-like group (network A) and the Basal-like group (network B).
  • ...and 3 more figures