Table of Contents
Fetching ...

Beyond Traditional Diagnostics: Transforming Patient-Side Information into Predictive Insights with Knowledge Graphs and Prototypes

Yibowen Zhao, Yinan Zhang, Zhixiang Su, Lizhen Cui, Chunyan Miao

TL;DR

The paper tackles predicting diseases from patient-side information, addressing data imbalance and interpretability by introducing KPI, a framework that fuses a knowledge-grounded disease graph, prototype-guided contrastive learning, and LLM-based explanations. KPI constructs a unified disease knowledge graph, derives patient-specific subgraphs, and aligns narrative embeddings with disease prototypes while enforcing cross-modal consistency. Empirical results on Haodf show KPI outperforms baselines in accuracy and provides clinically valid, patient-tailored explanations, with particular strength on long-tail diseases and efficient inference. The work advances patient-centered triage by delivering interpretable, knowledge-grounded predictions that can support clinicians and patients in early and informed decision-making.

Abstract

Predicting diseases solely from patient-side information, such as demographics and self-reported symptoms, has attracted significant research attention due to its potential to enhance patient awareness, facilitate early healthcare engagement, and improve healthcare system efficiency. However, existing approaches encounter critical challenges, including imbalanced disease distributions and a lack of interpretability, resulting in biased or unreliable predictions. To address these issues, we propose the Knowledge graph-enhanced, Prototype-aware, and Interpretable (KPI) framework. KPI systematically integrates structured and trusted medical knowledge into a unified disease knowledge graph, constructs clinically meaningful disease prototypes, and employs contrastive learning to enhance predictive accuracy, which is particularly important for long-tailed diseases. Additionally, KPI utilizes large language models (LLMs) to generate patient-specific, medically relevant explanations, thereby improving interpretability and reliability. Extensive experiments on real-world datasets demonstrate that KPI outperforms state-of-the-art methods in predictive accuracy and provides clinically valid explanations that closely align with patient narratives, highlighting its practical value for patient-centered healthcare delivery.

Beyond Traditional Diagnostics: Transforming Patient-Side Information into Predictive Insights with Knowledge Graphs and Prototypes

TL;DR

The paper tackles predicting diseases from patient-side information, addressing data imbalance and interpretability by introducing KPI, a framework that fuses a knowledge-grounded disease graph, prototype-guided contrastive learning, and LLM-based explanations. KPI constructs a unified disease knowledge graph, derives patient-specific subgraphs, and aligns narrative embeddings with disease prototypes while enforcing cross-modal consistency. Empirical results on Haodf show KPI outperforms baselines in accuracy and provides clinically valid, patient-tailored explanations, with particular strength on long-tail diseases and efficient inference. The work advances patient-centered triage by delivering interpretable, knowledge-grounded predictions that can support clinicians and patients in early and informed decision-making.

Abstract

Predicting diseases solely from patient-side information, such as demographics and self-reported symptoms, has attracted significant research attention due to its potential to enhance patient awareness, facilitate early healthcare engagement, and improve healthcare system efficiency. However, existing approaches encounter critical challenges, including imbalanced disease distributions and a lack of interpretability, resulting in biased or unreliable predictions. To address these issues, we propose the Knowledge graph-enhanced, Prototype-aware, and Interpretable (KPI) framework. KPI systematically integrates structured and trusted medical knowledge into a unified disease knowledge graph, constructs clinically meaningful disease prototypes, and employs contrastive learning to enhance predictive accuracy, which is particularly important for long-tailed diseases. Additionally, KPI utilizes large language models (LLMs) to generate patient-specific, medically relevant explanations, thereby improving interpretability and reliability. Extensive experiments on real-world datasets demonstrate that KPI outperforms state-of-the-art methods in predictive accuracy and provides clinically valid explanations that closely align with patient narratives, highlighting its practical value for patient-centered healthcare delivery.

Paper Structure

This paper contains 32 sections, 14 equations, 9 figures, 7 tables.

Figures (9)

  • Figure 1: Challenges for Current Patient Disease Prediction.
  • Figure 2: The Architecture of KPI. KPI builds a unified disease KG from authoritative descriptions and initializes disease prototypes via graph encoding. For each case, a transformer encodes patient narratives and retrieves a personalized subgraph; contrastive learning aligns the text-based embedding with the correct prototypes, and a semantic consistency regularizer enforces agreement between text- and graph-based patient embedding. At inference (dashed arrows), diseases are predicted by prototype–text similarity, and explanations are generated from the narrative together with its retrieved subgraph.
  • Figure 3: Stage 1: Prompt Template for Extracting Structured Knowledge Triplets from Disease Descriptions.
  • Figure 4: Stage 2: Prompt Template for Defining and Canonicalizing Relation Types Across Diseases.
  • Figure 5: Prompt Template for Generating Patient-Specific Explanations Using Retrieved Subgraphs and Self-Reported Narratives.
  • ...and 4 more figures