MOTGNN: Interpretable Graph Neural Networks for Multi-Omics Disease Classification
Tiantian Yang, Zhiqian Chen
TL;DR
MOTGNN tackles the challenge of small-sample, high-dimensional, multi-omics disease prediction by generating modality-specific, supervised graphs via XGBoost trees and learning embeddings with GEDFN-based GNNs on each graph. A deep feedforward network then fuses these embeddings for binary classification, while providing end-to-end interpretability through feature- and omics-level importance scores. Across TCGA cancer datasets, MOTGNN consistently outperforms baselines and maintains robustness under class imbalance, with insights into which modalities and biomarkers drive predictions. The framework offers a scalable, interpretable approach for integrating heterogeneous omics data to enhance disease inference and biomarker discovery.
Abstract
Integrating multi-omics data, such as DNA methylation, mRNA expression, and microRNA (miRNA) expression, offers a comprehensive view of the biological mechanisms underlying disease. However, the high dimensionality of multi-omics data, the heterogeneity across modalities, and the lack of reliable biological interaction networks make meaningful integration challenging. In addition, many existing models rely on handcrafted similarity graphs, are vulnerable to class imbalance, and often lack built-in interpretability, limiting their usefulness in biomedical applications. We propose Multi-Omics integration with Tree-generated Graph Neural Network (MOTGNN), a novel and interpretable framework for binary disease classification. MOTGNN employs eXtreme Gradient Boosting (XGBoost) for omics-specific supervised graph construction, followed by modality-specific Graph Neural Networks (GNNs) for hierarchical representation learning, and a deep feedforward network for cross-omics integration. Across three real-world disease datasets, MOTGNN outperforms state-of-the-art baselines by 5-10% in accuracy, ROC-AUC, and F1-score, and remains robust to severe class imbalance. The model maintains computational efficiency through the use of sparse graphs and provides built-in interpretability, revealing both top-ranked biomarkers and the relative contributions of each omics modality. These results highlight the potential of MOTGNN to improve both predictive accuracy and interpretability in multi-omics disease modeling.
