medIKAL: Integrating Knowledge Graphs as Assistants of LLMs for Enhanced Clinical Diagnosis on EMRs
Mingyi Jia, Junwen Duan, Yan Song, Jianxin Wang
TL;DR
medIKAL tackles the challenge of EMR-based clinical diagnosis by marrying LLM capabilities with a weighted knowledge-graph search. It introduces weighted entity-type scores $w_{t}$ to localize candidate diseases in the KG, and uses a residual-like integration to blend the LLM’s initial diagnosis with KG-derived candidates, followed by a path-based reranking using the shortest-path distance $ ext{dist}( ext{D}_i,e_j)$. A KG knowledge reconstruction step yields a semi-structured input for the LLM via fill-in-the-blank prompts, guided by a threshold $\theta$ (set to 60% of the total score) to decide final diagnoses. Experiments on the open Chinese EMR dataset CMEMR and supplementary EMR datasets show that medIKAL outperforms strong baselines and demonstrates robustness across backbones and data conditions, highlighting its potential for practical AI-assisted clinical diagnosis while noting limitations related to data sparsity and numerical indicator handling.
Abstract
Electronic Medical Records (EMRs), while integral to modern healthcare, present challenges for clinical reasoning and diagnosis due to their complexity and information redundancy. To address this, we proposed medIKAL (Integrating Knowledge Graphs as Assistants of LLMs), a framework that combines Large Language Models (LLMs) with knowledge graphs (KGs) to enhance diagnostic capabilities. medIKAL assigns weighted importance to entities in medical records based on their type, enabling precise localization of candidate diseases within KGs. It innovatively employs a residual network-like approach, allowing initial diagnosis by the LLM to be merged into KG search results. Through a path-based reranking algorithm and a fill-in-the-blank style prompt template, it further refined the diagnostic process. We validated medIKAL's effectiveness through extensive experiments on a newly introduced open-sourced Chinese EMR dataset, demonstrating its potential to improve clinical diagnosis in real-world settings.
