Interpretable High-order Knowledge Graph Neural Network for Predicting Synthetic Lethality in Human Cancers
Xuexin Chen, Ruichu Cai, Zhengting Huang, Zijian Li, Jie Zheng, Min Wu
TL;DR
This work addresses the challenge of predicting synthetic lethality while providing trustworthy and diverse explanations. It introduces DGIB4SL, a knowledge‑graph–driven GNN that replaces attention with a Diverse Graph Information Bottleneck (DGIB) objective, augmented by a Determinantal Point Process (DPP) constraint to encourage diverse core subgraphs for the same gene pair. A motif‑based encoder captures high‑order graph structures by aggregating 13 motif views with injective concatenation, yielding robust subgraph representations for prediction. Empirically, DGIB4SL achieves state‑of‑the‑art SL prediction performance on SynLethKG/SynLethDB and delivers multiple informative explanations that reveal diverse biological mechanisms; ablations confirm the importance of motifs and diversity constraints, while stability analyses show reduced variability compared to attention‑based baselines.
Abstract
Synthetic lethality (SL) is a promising gene interaction for cancer therapy. Recent SL prediction methods integrate knowledge graphs (KGs) into graph neural networks (GNNs) and employ attention mechanisms to extract local subgraphs as explanations for target gene pairs. However, attention mechanisms often lack fidelity, typically generate a single explanation per gene pair, and fail to ensure trustworthy high-order structures in their explanations. To overcome these limitations, we propose Diverse Graph Information Bottleneck for Synthetic Lethality (DGIB4SL), a KG-based GNN that generates multiple faithful explanations for the same gene pair and effectively encodes high-order structures. Specifically, we introduce a novel DGIB objective, integrating a Determinant Point Process (DPP) constraint into the standard IB objective, and employ 13 motif-based adjacency matrices to capture high-order structures in gene representations. Experimental results show that DGIB4SL outperforms state-of-the-art baselines and provides multiple explanations for SL prediction, revealing diverse biological mechanisms underlying SL inference.
