Table of Contents
Fetching ...

Interpretable High-order Knowledge Graph Neural Network for Predicting Synthetic Lethality in Human Cancers

Xuexin Chen, Ruichu Cai, Zhengting Huang, Zijian Li, Jie Zheng, Min Wu

TL;DR

This work addresses the challenge of predicting synthetic lethality while providing trustworthy and diverse explanations. It introduces DGIB4SL, a knowledge‑graph–driven GNN that replaces attention with a Diverse Graph Information Bottleneck (DGIB) objective, augmented by a Determinantal Point Process (DPP) constraint to encourage diverse core subgraphs for the same gene pair. A motif‑based encoder captures high‑order graph structures by aggregating 13 motif views with injective concatenation, yielding robust subgraph representations for prediction. Empirically, DGIB4SL achieves state‑of‑the‑art SL prediction performance on SynLethKG/SynLethDB and delivers multiple informative explanations that reveal diverse biological mechanisms; ablations confirm the importance of motifs and diversity constraints, while stability analyses show reduced variability compared to attention‑based baselines.

Abstract

Synthetic lethality (SL) is a promising gene interaction for cancer therapy. Recent SL prediction methods integrate knowledge graphs (KGs) into graph neural networks (GNNs) and employ attention mechanisms to extract local subgraphs as explanations for target gene pairs. However, attention mechanisms often lack fidelity, typically generate a single explanation per gene pair, and fail to ensure trustworthy high-order structures in their explanations. To overcome these limitations, we propose Diverse Graph Information Bottleneck for Synthetic Lethality (DGIB4SL), a KG-based GNN that generates multiple faithful explanations for the same gene pair and effectively encodes high-order structures. Specifically, we introduce a novel DGIB objective, integrating a Determinant Point Process (DPP) constraint into the standard IB objective, and employ 13 motif-based adjacency matrices to capture high-order structures in gene representations. Experimental results show that DGIB4SL outperforms state-of-the-art baselines and provides multiple explanations for SL prediction, revealing diverse biological mechanisms underlying SL inference.

Interpretable High-order Knowledge Graph Neural Network for Predicting Synthetic Lethality in Human Cancers

TL;DR

This work addresses the challenge of predicting synthetic lethality while providing trustworthy and diverse explanations. It introduces DGIB4SL, a knowledge‑graph–driven GNN that replaces attention with a Diverse Graph Information Bottleneck (DGIB) objective, augmented by a Determinantal Point Process (DPP) constraint to encourage diverse core subgraphs for the same gene pair. A motif‑based encoder captures high‑order graph structures by aggregating 13 motif views with injective concatenation, yielding robust subgraph representations for prediction. Empirically, DGIB4SL achieves state‑of‑the‑art SL prediction performance on SynLethKG/SynLethDB and delivers multiple informative explanations that reveal diverse biological mechanisms; ablations confirm the importance of motifs and diversity constraints, while stability analyses show reduced variability compared to attention‑based baselines.

Abstract

Synthetic lethality (SL) is a promising gene interaction for cancer therapy. Recent SL prediction methods integrate knowledge graphs (KGs) into graph neural networks (GNNs) and employ attention mechanisms to extract local subgraphs as explanations for target gene pairs. However, attention mechanisms often lack fidelity, typically generate a single explanation per gene pair, and fail to ensure trustworthy high-order structures in their explanations. To overcome these limitations, we propose Diverse Graph Information Bottleneck for Synthetic Lethality (DGIB4SL), a KG-based GNN that generates multiple faithful explanations for the same gene pair and effectively encodes high-order structures. Specifically, we introduce a novel DGIB objective, integrating a Determinant Point Process (DPP) constraint into the standard IB objective, and employ 13 motif-based adjacency matrices to capture high-order structures in gene representations. Experimental results show that DGIB4SL outperforms state-of-the-art baselines and provides multiple explanations for SL prediction, revealing diverse biological mechanisms underlying SL inference.

Paper Structure

This paper contains 35 sections, 2 theorems, 27 equations, 10 figures, 7 tables.

Key Result

Proposition 1

(Upper bound of $-I(Y;\widetilde{G}^{uv}_k)$). For a graph $G^{uv}$ with label $Y$ and the $k$-th IB-Graph $\widetilde{G}_k^{uv}$ learned from $G^{uv}$, we have where $q_\theta(Y| \widetilde{G}^{uv}_k)$ parameterized by $\theta$ is the variational approximation of $p(Y| \widetilde{G}^{uv}_k)$.

Figures (10)

  • Figure 1: Toy example of a knowledge graph with self-loops integrating biological context and relevant mechanisms between the given gene pair BRCA1 and PARP1. The purple and blue subgraphs illustrate mechanisms where either the conversion of SSBs to DSBs or the blockage of replication forks leads to DNA damage in the absence of HR helleday2011underlying. The gray subgraph represents the predicted core subgraph of an attention-based method. A GIB-based method identifies only one correct subgraph, while our DGIB4SL can find all correct subgraphs (purple and blue). HR, SSB and DSB are abbreviations for "Homologous Recombination", "Single Srand Break" and "Double Srand Break", respectively. The self-loops are not depicted for brevity.
  • Figure 2: Overview of DGIB4SL. DGIB4SL takes the enclosing graph data $G^{uv} = (A^{uv}, X^{uv}, E^{uv})$ around genes $u$ and $v$ as inputs, throughout the phases (a),(b), and (c), and outputs the interaction confidence of the gene pair $(u, v)$ and $K$ IB-graphs $\widetilde{G}^{uv}_1$, ..., $\widetilde{G}^{uv}_K$ that captures the high-order graph structure. In phase (a), an IB-graph $\widetilde{G}^{uv}_k$ is generated by injecting random noise to select important edges, with edge weights $B^{uv}_k$ estimated from $G^{uv}$ using the edge weight estimation module (Eq. S11). $B^{uv}_k$ serves as the parameter for a multi-dimensional Bernoulli distribution, from which an adjacency matrix of $\widetilde{G}^{uv}_k$ is sampled. In phase (b), IB-graph representations are learned via variational estimation. Each IB graph data $\widetilde{G}^{uv}_k$ is passed through the same motif-based GNN $f_\phi$ (Eq. 10) to obtain a distribution from which a representation $\widetilde{Z}^{uv}_k$ is sampled. The motif-based GNN, shown in subfigure (d), projects the IB-graph into 13 motif-based matrices. Each motif-based matrix $M^{uv}_k$ is processed by a different GIN encoder to produce motif-wise representations, which are then concatenated (Eq. 15). In phase (c), each IB-graph representation is passed through an MLP-based classifier to make $K$ predictions (Eq. 11). During training, the representations and predictions are used to compute DPP and GIB, which are jointly optimized in DGIB4SL.
  • Figure 3: All 3-node motifs in a directed and unweighted graph.
  • Figure 4: Two explanations learned from our DGIB4SL, provide different insights into the biological mechanisms underlying SL of the same gene pair (BRCA1, E2F1). For details on the edge nomenclature, please refer to Table S4 in Appendix.
  • Figure 5: Ablation study of DGIB4SL for Motif, DGIB, and DPP on NDCG@10, Recall@10, Precision@10, MAP@10 (left Y-axis) and one diversity metric DPP (right Y-axis).
  • ...and 5 more figures

Theorems & Definitions (9)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Definition 5
  • Proposition 1
  • proof
  • Proposition 2
  • proof