Heterogeneous Causal Metapath Graph Neural Network for Gene-Microbe-Disease Association Prediction
Kexin Zhang, Feng Huang, Luotao Liu, Zhankun Xiong, Hongyu Zhang, Yuan Quan, Wen Zhang
TL;DR
This work tackles the prediction of triple-wise gene–microbe–disease ($GMD$) associations by introducing Heterogeneous Causal Metapath Graph Neural Network (HCMGNN). HCMGNN builds a $GMD$ heterogeneous graph, derives six directed causal subgraphs via causal metapaths, and learns multi-view embeddings through intra-subgraph causal semantic sharing followed by inter-subgraph attention-based fusion. The model demonstrates superior predictive performance over a range of baselines, with ablations confirming the importance of causal metapaths, shared semantics, and feature information, and shows particular strength for sparse, low-degree triplets. This approach provides a principled framework to capture directional, high-order interactions among genes, microbes, and diseases, potentially guiding experimental validation and accelerating discovery of novel GMD associations.
Abstract
The recent focus on microbes in human medicine highlights their potential role in the genetic framework of diseases. To decode the complex interactions among genes, microbes, and diseases, computational predictions of gene-microbe-disease (GMD) associations are crucial. Existing methods primarily address gene-disease and microbe-disease associations, but the more intricate triple-wise GMD associations remain less explored. In this paper, we propose a Heterogeneous Causal Metapath Graph Neural Network (HCMGNN) to predict GMD associations. HCMGNN constructs a heterogeneous graph linking genes, microbes, and diseases through their pairwise associations, and utilizes six predefined causal metapaths to extract directed causal subgraphs, which facilitate the multi-view analysis of causal relations among three entity types. Within each subgraph, we employ a causal semantic sharing message passing network for node representation learning, coupled with an attentive fusion method to integrate these representations for predicting GMD associations. Our extensive experiments show that HCMGNN effectively predicts GMD associations and addresses association sparsity issue by enhancing the graph's semantics and structure.
