Table of Contents
Fetching ...

Enhancing LLM Generation with Knowledge Hypergraph for Evidence-Based Medicine

Chengfeng Dou, Ying Zhang, Zhi Jin, Wenpin Jiao, Haiyan Zhao, Yongqiang Zhao, Zhengwei Tao

TL;DR

This work tackles the challenge of collecting and organizing dispersed medical evidence for evidence-based medicine in LLM workflows. It introduces EbmKG, a knowledge hypergraph that represents multivariate medical evidence with entities, topics, and evidences, and the IDEP algorithm that prioritizes evidence within identified topics for retrieval-augmented generation. Through six benchmarks spanning medical QA, hallucination detection, and clinical decision support, IdepRAG consistently outperforms VectorRAG and GraphRAG in both generation and retrieval tasks, while leveraging a random-walk topic locating and LLM-derived evidence features to guide evidence selection. The authors open-source the large-scale EbmKG and evaluation benchmarks to facilitate future research in RAG for EBM, highlighting practical impacts on accuracy, safety, and resource efficiency in medical AI systems.

Abstract

Evidence-based medicine (EBM) plays a crucial role in the application of large language models (LLMs) in healthcare, as it provides reliable support for medical decision-making processes. Although it benefits from current retrieval-augmented generation~(RAG) technologies, it still faces two significant challenges: the collection of dispersed evidence and the efficient organization of this evidence to support the complex queries necessary for EBM. To tackle these issues, we propose using LLMs to gather scattered evidence from multiple sources and present a knowledge hypergraph-based evidence management model to integrate these evidence while capturing intricate relationships. Furthermore, to better support complex queries, we have developed an Importance-Driven Evidence Prioritization (IDEP) algorithm that utilizes the LLM to generate multiple evidence features, each with an associated importance score, which are then used to rank the evidence and produce the final retrieval results. Experimental results from six datasets demonstrate that our approach outperforms existing RAG techniques in application domains of interest to EBM, such as medical quizzing, hallucination detection, and decision support. Testsets and the constructed knowledge graph can be accessed at \href{https://drive.google.com/file/d/1WJ9QTokK3MdkjEmwuFQxwH96j_Byawj_/view?usp=drive_link}{https://drive.google.com/rag4ebm}.

Enhancing LLM Generation with Knowledge Hypergraph for Evidence-Based Medicine

TL;DR

This work tackles the challenge of collecting and organizing dispersed medical evidence for evidence-based medicine in LLM workflows. It introduces EbmKG, a knowledge hypergraph that represents multivariate medical evidence with entities, topics, and evidences, and the IDEP algorithm that prioritizes evidence within identified topics for retrieval-augmented generation. Through six benchmarks spanning medical QA, hallucination detection, and clinical decision support, IdepRAG consistently outperforms VectorRAG and GraphRAG in both generation and retrieval tasks, while leveraging a random-walk topic locating and LLM-derived evidence features to guide evidence selection. The authors open-source the large-scale EbmKG and evaluation benchmarks to facilitate future research in RAG for EBM, highlighting practical impacts on accuracy, safety, and resource efficiency in medical AI systems.

Abstract

Evidence-based medicine (EBM) plays a crucial role in the application of large language models (LLMs) in healthcare, as it provides reliable support for medical decision-making processes. Although it benefits from current retrieval-augmented generation~(RAG) technologies, it still faces two significant challenges: the collection of dispersed evidence and the efficient organization of this evidence to support the complex queries necessary for EBM. To tackle these issues, we propose using LLMs to gather scattered evidence from multiple sources and present a knowledge hypergraph-based evidence management model to integrate these evidence while capturing intricate relationships. Furthermore, to better support complex queries, we have developed an Importance-Driven Evidence Prioritization (IDEP) algorithm that utilizes the LLM to generate multiple evidence features, each with an associated importance score, which are then used to rank the evidence and produce the final retrieval results. Experimental results from six datasets demonstrate that our approach outperforms existing RAG techniques in application domains of interest to EBM, such as medical quizzing, hallucination detection, and decision support. Testsets and the constructed knowledge graph can be accessed at \href{https://drive.google.com/file/d/1WJ9QTokK3MdkjEmwuFQxwH96j_Byawj_/view?usp=drive_link}{https://drive.google.com/rag4ebm}.

Paper Structure

This paper contains 53 sections, 8 equations, 17 figures, 4 tables.

Figures (17)

  • Figure 1: The phenomenon of mis-decomposition of complex relationships. LLMs omit the conditional variable "intra-atrial reentrant" for "tachycardia," leading to incorrect extraction.
  • Figure 2: The Schema of EbmKG. The green ellipse denotes the hyperrelations corresponding to topic. The blue entities denote topic keywords, while red entities indicate evidence keywords. Evidence under the same topic has the same label.
  • Figure 3: The construction process of EbmKG involves several key elements. Colored squares are utilized to represent evidence, with different colors distinguishing the sources of the evidence and the text within the squares indicating specific label of the evidence. Colored circles are employed to represent entities that are extracted from the evidence. White ovals represent topics, which are derived from evidence with the same aspect words.
  • Figure 4: This figure is divided into two sections: the upper part outlines data processing, and the lower part provides an example. In 'Step 1 Example', entities in the EbmKG are shown as circles, topics as diamonds, and evidence as rectangles. Only green nodes, filtered by Personal PageRank, advance to Step 2. In 'Step 2 Example', an LLM assigns a Usefulness Score to each topic based on predefined Search Conditions, which determines the final evidence score. For NER and Feature Extraction Prompts, see Figure \ref{['fig:all_prompt']} (Search Words Extraction and Evidence Features Extraction). For Search Conditions, refer to Figure \ref{['fig:score_rules']}.
  • Figure 5: The Influence of Model Parameter Numbers on IdepRAG. Retriever (72B): This configuration uses a model with 72 billion parameters when executing the IDEP algorithm to retrieve high quality evidence. Retriever(S): This setup uses models with the same parameter scales in both the retrieval and generation phases.
  • ...and 12 more figures