Table of Contents
Fetching ...

medIKAL: Integrating Knowledge Graphs as Assistants of LLMs for Enhanced Clinical Diagnosis on EMRs

Mingyi Jia, Junwen Duan, Yan Song, Jianxin Wang

TL;DR

medIKAL tackles the challenge of EMR-based clinical diagnosis by marrying LLM capabilities with a weighted knowledge-graph search. It introduces weighted entity-type scores $w_{t}$ to localize candidate diseases in the KG, and uses a residual-like integration to blend the LLM’s initial diagnosis with KG-derived candidates, followed by a path-based reranking using the shortest-path distance $ ext{dist}( ext{D}_i,e_j)$. A KG knowledge reconstruction step yields a semi-structured input for the LLM via fill-in-the-blank prompts, guided by a threshold $\theta$ (set to 60% of the total score) to decide final diagnoses. Experiments on the open Chinese EMR dataset CMEMR and supplementary EMR datasets show that medIKAL outperforms strong baselines and demonstrates robustness across backbones and data conditions, highlighting its potential for practical AI-assisted clinical diagnosis while noting limitations related to data sparsity and numerical indicator handling.

Abstract

Electronic Medical Records (EMRs), while integral to modern healthcare, present challenges for clinical reasoning and diagnosis due to their complexity and information redundancy. To address this, we proposed medIKAL (Integrating Knowledge Graphs as Assistants of LLMs), a framework that combines Large Language Models (LLMs) with knowledge graphs (KGs) to enhance diagnostic capabilities. medIKAL assigns weighted importance to entities in medical records based on their type, enabling precise localization of candidate diseases within KGs. It innovatively employs a residual network-like approach, allowing initial diagnosis by the LLM to be merged into KG search results. Through a path-based reranking algorithm and a fill-in-the-blank style prompt template, it further refined the diagnostic process. We validated medIKAL's effectiveness through extensive experiments on a newly introduced open-sourced Chinese EMR dataset, demonstrating its potential to improve clinical diagnosis in real-world settings.

medIKAL: Integrating Knowledge Graphs as Assistants of LLMs for Enhanced Clinical Diagnosis on EMRs

TL;DR

medIKAL tackles the challenge of EMR-based clinical diagnosis by marrying LLM capabilities with a weighted knowledge-graph search. It introduces weighted entity-type scores to localize candidate diseases in the KG, and uses a residual-like integration to blend the LLM’s initial diagnosis with KG-derived candidates, followed by a path-based reranking using the shortest-path distance . A KG knowledge reconstruction step yields a semi-structured input for the LLM via fill-in-the-blank prompts, guided by a threshold (set to 60% of the total score) to decide final diagnoses. Experiments on the open Chinese EMR dataset CMEMR and supplementary EMR datasets show that medIKAL outperforms strong baselines and demonstrates robustness across backbones and data conditions, highlighting its potential for practical AI-assisted clinical diagnosis while noting limitations related to data sparsity and numerical indicator handling.

Abstract

Electronic Medical Records (EMRs), while integral to modern healthcare, present challenges for clinical reasoning and diagnosis due to their complexity and information redundancy. To address this, we proposed medIKAL (Integrating Knowledge Graphs as Assistants of LLMs), a framework that combines Large Language Models (LLMs) with knowledge graphs (KGs) to enhance diagnostic capabilities. medIKAL assigns weighted importance to entities in medical records based on their type, enabling precise localization of candidate diseases within KGs. It innovatively employs a residual network-like approach, allowing initial diagnosis by the LLM to be merged into KG search results. Through a path-based reranking algorithm and a fill-in-the-blank style prompt template, it further refined the diagnostic process. We validated medIKAL's effectiveness through extensive experiments on a newly introduced open-sourced Chinese EMR dataset, demonstrating its potential to improve clinical diagnosis in real-world settings.
Paper Structure (38 sections, 5 equations, 7 figures, 14 tables, 2 algorithms)

This paper contains 38 sections, 5 equations, 7 figures, 14 tables, 2 algorithms.

Figures (7)

  • Figure 1: Limitations of existing methods using KG-augmented LLMs for application to EMR diagnostic tasks. ① use subgraphs/triplets to augment context.② use reasoning chains to augment context. ③ use the iteration-based approach to involve LLMs in KG searching and reasoning.
  • Figure 2: The overall workflow of medIKAL. It contains three main modules, namely: Module 1. preprocess before KG search (A, B, and C.1); Module 2. Candidate Disease Localization and Reranking via KG (C.2 and D); Module 3. Collaborative Reasoning for LLMs and KG (E).
  • Figure 3: An illustration of how to combine reranking process with the knowledge construction process.
  • Figure 4: Evaluation results for medIKAL and other baseline methods' capabilities of utilizing LLM's internal knowledge. "Retained" denotes that the useful diagnoses from LLM's original predictions are kept as final results, and "Lost" denotes the opposite.
  • Figure 5: A data example from CMEMR.
  • ...and 2 more figures