Table of Contents
Fetching ...

Knowledge Graph Augmented Large Language Models for Disease Prediction

Ruiyu Wang, Tuan Vinh, Ran Xu, Yuyin Zhou, Jiaying Lu, Carl Yang, Francisco Pasquel

TL;DR

Electronic health records offer powerful predictions but often lack patient-specific, clinically grounded explanations. The authors ground chain-of-thought reasoning in a biomedical knowledge graph (PrimeKG) by mapping ICD-9 codes to KG nodes, mining disease-specific relevance, and generating KG-anchored CoT explanations that are filtered for label-consistency. They fine-tune small open-weight LLMs on this KG-guided supervision and demonstrate competitive performance on ten PrimeKG-mapped diseases with limited data, plus strong zero-shot transfer to a separate cohort, accompanied by clinician-preferred explanations. Clinician evaluation confirms that KG-guided CoT traces are clearer, more relevant, and more clinically sound than untuned baselines, suggesting practical utility as a clinician-facing reasoning layer. Overall, KG-anchored CoT provides data-efficient, interpretable prognostic reasoning that generalizes across cohorts and can complement traditional risk-model approaches in clinical decision support.

Abstract

Electronic health records (EHRs) support powerful clinical prediction models, but existing methods typically provide coarse, post hoc explanations that offer limited value for patient-level decision making. We introduce a knowledge graph (KG)-guided chain-of-thought (CoT) framework that generates clinically grounded and temporally consistent reasoning for visit-level disease prediction in MIMIC-III. ICD-9 codes are mapped to PrimeKG, from which disease-relevant nodes and multi-hop reasoning paths are extracted and used as scaffolds for CoT generation; only explanations whose conclusions match observed outcomes are retained. Lightweight LLaMA-3.1-Instruct-8B and Gemma-7B models are then fine-tuned on this supervision corpus. Across ten PrimeKG-mapped diseases and limited training cohorts (400 and 1000 cases), KG-guided models outperform strong classical baselines, achieving AUROC values of 0.66 to 0.70 and macro-AUPR values of 0.40 to 0.47. The models also transfer zero-shot to the CRADLE cohort, improving accuracy from approximately 0.40 to 0.51 up to 0.72 to 0.77. A blinded clinician evaluation shows consistent preference for KG-guided CoT explanations in clarity, relevance, and clinical correctness.

Knowledge Graph Augmented Large Language Models for Disease Prediction

TL;DR

Electronic health records offer powerful predictions but often lack patient-specific, clinically grounded explanations. The authors ground chain-of-thought reasoning in a biomedical knowledge graph (PrimeKG) by mapping ICD-9 codes to KG nodes, mining disease-specific relevance, and generating KG-anchored CoT explanations that are filtered for label-consistency. They fine-tune small open-weight LLMs on this KG-guided supervision and demonstrate competitive performance on ten PrimeKG-mapped diseases with limited data, plus strong zero-shot transfer to a separate cohort, accompanied by clinician-preferred explanations. Clinician evaluation confirms that KG-guided CoT traces are clearer, more relevant, and more clinically sound than untuned baselines, suggesting practical utility as a clinician-facing reasoning layer. Overall, KG-anchored CoT provides data-efficient, interpretable prognostic reasoning that generalizes across cohorts and can complement traditional risk-model approaches in clinical decision support.

Abstract

Electronic health records (EHRs) support powerful clinical prediction models, but existing methods typically provide coarse, post hoc explanations that offer limited value for patient-level decision making. We introduce a knowledge graph (KG)-guided chain-of-thought (CoT) framework that generates clinically grounded and temporally consistent reasoning for visit-level disease prediction in MIMIC-III. ICD-9 codes are mapped to PrimeKG, from which disease-relevant nodes and multi-hop reasoning paths are extracted and used as scaffolds for CoT generation; only explanations whose conclusions match observed outcomes are retained. Lightweight LLaMA-3.1-Instruct-8B and Gemma-7B models are then fine-tuned on this supervision corpus. Across ten PrimeKG-mapped diseases and limited training cohorts (400 and 1000 cases), KG-guided models outperform strong classical baselines, achieving AUROC values of 0.66 to 0.70 and macro-AUPR values of 0.40 to 0.47. The models also transfer zero-shot to the CRADLE cohort, improving accuracy from approximately 0.40 to 0.51 up to 0.72 to 0.77. A blinded clinician evaluation shows consistent preference for KG-guided CoT explanations in clarity, relevance, and clinical correctness.

Paper Structure

This paper contains 10 sections, 6 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: General pipeline for KG-guided CoT data generation.
  • Figure 2: Schematic and example of KG-guided CoT generation and filtering.
  • Figure 3: Clinician preferences for reasoning quality.
  • Figure 4: Prompt templates used in our KG-guided CoT pipeline: (top-left) disease-relevant node selection, (top-right) KG path pruning and selection, (bottom) CoT generation conditioned on KG evidence and visit features.
  • Figure :