Enhancing LLM Generation with Knowledge Hypergraph for Evidence-Based Medicine
Chengfeng Dou, Ying Zhang, Zhi Jin, Wenpin Jiao, Haiyan Zhao, Yongqiang Zhao, Zhengwei Tao
TL;DR
This work tackles the challenge of collecting and organizing dispersed medical evidence for evidence-based medicine in LLM workflows. It introduces EbmKG, a knowledge hypergraph that represents multivariate medical evidence with entities, topics, and evidences, and the IDEP algorithm that prioritizes evidence within identified topics for retrieval-augmented generation. Through six benchmarks spanning medical QA, hallucination detection, and clinical decision support, IdepRAG consistently outperforms VectorRAG and GraphRAG in both generation and retrieval tasks, while leveraging a random-walk topic locating and LLM-derived evidence features to guide evidence selection. The authors open-source the large-scale EbmKG and evaluation benchmarks to facilitate future research in RAG for EBM, highlighting practical impacts on accuracy, safety, and resource efficiency in medical AI systems.
Abstract
Evidence-based medicine (EBM) plays a crucial role in the application of large language models (LLMs) in healthcare, as it provides reliable support for medical decision-making processes. Although it benefits from current retrieval-augmented generation~(RAG) technologies, it still faces two significant challenges: the collection of dispersed evidence and the efficient organization of this evidence to support the complex queries necessary for EBM. To tackle these issues, we propose using LLMs to gather scattered evidence from multiple sources and present a knowledge hypergraph-based evidence management model to integrate these evidence while capturing intricate relationships. Furthermore, to better support complex queries, we have developed an Importance-Driven Evidence Prioritization (IDEP) algorithm that utilizes the LLM to generate multiple evidence features, each with an associated importance score, which are then used to rank the evidence and produce the final retrieval results. Experimental results from six datasets demonstrate that our approach outperforms existing RAG techniques in application domains of interest to EBM, such as medical quizzing, hallucination detection, and decision support. Testsets and the constructed knowledge graph can be accessed at \href{https://drive.google.com/file/d/1WJ9QTokK3MdkjEmwuFQxwH96j_Byawj_/view?usp=drive_link}{https://drive.google.com/rag4ebm}.
