Knowledge Graph Sparsification for GNN-based Rare Disease Diagnosis
Premt Cara, Kamilia Zaripova, David Bani-Harouni, Nassir Navab, Azade Farshad
TL;DR
RareNet tackles the challenge of rare genetic disease diagnosis in data-scarce settings by using only patient phenotypes to extract a patient-specific subgraph from a biomedical knowledge graph and to rank candidate causal genes. It combines phenotype-centered subgraph sampling, Graph Attention Network processing, and joint subgraph and gene scoring with a teacher–student training objective, enabling robust performance even with noisy phenotypes. The method can operate as a standalone diagnostic tool or as a pre-/post-processing filter to improve existing gene-prioritization frameworks, and it demonstrates competitive results on simulated data and real-world MyGene2 data, along with consistent improvements when combined with other methods. This phenotype-driven KG approach has the potential to democratize access to advanced genetic analysis in resource-limited settings and offers interpretable subgraphs that can guide clinical investigation and validation.
Abstract
Rare genetic disease diagnosis faces critical challenges: insufficient patient data, inaccessible full genome sequencing, and the immense number of possible causative genes. These limitations cause prolonged diagnostic journeys, inappropriate treatments, and critical delays, disproportionately affecting patients in resource-limited settings where diagnostic tools are scarce. We propose RareNet, a subgraph-based Graph Neural Network that requires only patient phenotypes to identify the most likely causal gene and retrieve focused patient subgraphs for targeted clinical investigation. RareNet can function as a standalone method or serve as a pre-processing or post-processing filter for other candidate gene prioritization methods, consistently enhancing their performance while potentially enabling explainable insights. Through comprehensive evaluation on two biomedical datasets, we demonstrate competitive and robust causal gene prediction and significant performance gains when integrated with other frameworks. By requiring only phenotypic data, which is readily available in any clinical setting, RareNet democratizes access to sophisticated genetic analysis, offering particular value for underserved populations lacking advanced genomic infrastructure.
