Table of Contents
Fetching ...

HyKGE: A Hypothesis Knowledge Graph Enhanced Framework for Accurate and Reliable Medical LLMs Responses

Xinke Jiang, Ruizhe Zhang, Yongxin Xu, Rihong Qiu, Yue Fang, Zhiyuan Wang, Jinyi Tang, Hongxin Ding, Xu Chu, Junfeng Zhao, Yasha Wang

TL;DR

HyKGE tackles reliability gaps in medical LLMs by integrating hypothesis-driven pre-retrieval exploration with a knowledge-graph–based retrieval and a fragment-aware post-retrieval reranker. The framework uses Hypothesis Output to guide entity anchoring in a large medical KG, retrieves rich reasoning chains, and prunes noise before passing structured knowledge to an LLM Reader. Experiments on Chinese medical Q&A datasets show HyKGE outperforms baselines in accuracy and interpretability, while maintaining reasonable efficiency. The work demonstrates a practical path to more trustworthy medical AI, with potential to generalize to additional languages and knowledge graphs.

Abstract

In this paper, we investigate the retrieval-augmented generation (RAG) based on Knowledge Graphs (KGs) to improve the accuracy and reliability of Large Language Models (LLMs). Recent approaches suffer from insufficient and repetitive knowledge retrieval, tedious and time-consuming query parsing, and monotonous knowledge utilization. To this end, we develop a Hypothesis Knowledge Graph Enhanced (HyKGE) framework, which leverages LLMs' powerful reasoning capacity to compensate for the incompleteness of user queries, optimizes the interaction process with LLMs, and provides diverse retrieved knowledge. Specifically, HyKGE explores the zero-shot capability and the rich knowledge of LLMs with Hypothesis Outputs to extend feasible exploration directions in the KGs, as well as the carefully curated prompt to enhance the density and efficiency of LLMs' responses. Furthermore, we introduce the HO Fragment Granularity-aware Rerank Module to filter out noise while ensuring the balance between diversity and relevance in retrieved knowledge. Experiments on two Chinese medical multiple-choice question datasets and one Chinese open-domain medical Q&A dataset with two LLM turbos demonstrate the superiority of HyKGE in terms of accuracy and explainability.

HyKGE: A Hypothesis Knowledge Graph Enhanced Framework for Accurate and Reliable Medical LLMs Responses

TL;DR

HyKGE tackles reliability gaps in medical LLMs by integrating hypothesis-driven pre-retrieval exploration with a knowledge-graph–based retrieval and a fragment-aware post-retrieval reranker. The framework uses Hypothesis Output to guide entity anchoring in a large medical KG, retrieves rich reasoning chains, and prunes noise before passing structured knowledge to an LLM Reader. Experiments on Chinese medical Q&A datasets show HyKGE outperforms baselines in accuracy and interpretability, while maintaining reasonable efficiency. The work demonstrates a practical path to more trustworthy medical AI, with potential to generalize to additional languages and knowledge graphs.

Abstract

In this paper, we investigate the retrieval-augmented generation (RAG) based on Knowledge Graphs (KGs) to improve the accuracy and reliability of Large Language Models (LLMs). Recent approaches suffer from insufficient and repetitive knowledge retrieval, tedious and time-consuming query parsing, and monotonous knowledge utilization. To this end, we develop a Hypothesis Knowledge Graph Enhanced (HyKGE) framework, which leverages LLMs' powerful reasoning capacity to compensate for the incompleteness of user queries, optimizes the interaction process with LLMs, and provides diverse retrieved knowledge. Specifically, HyKGE explores the zero-shot capability and the rich knowledge of LLMs with Hypothesis Outputs to extend feasible exploration directions in the KGs, as well as the carefully curated prompt to enhance the density and efficiency of LLMs' responses. Furthermore, we introduce the HO Fragment Granularity-aware Rerank Module to filter out noise while ensuring the balance between diversity and relevance in retrieved knowledge. Experiments on two Chinese medical multiple-choice question datasets and one Chinese open-domain medical Q&A dataset with two LLM turbos demonstrate the superiority of HyKGE in terms of accuracy and explainability.
Paper Structure (30 sections, 7 equations, 5 figures, 4 tables, 1 algorithm)

This paper contains 30 sections, 7 equations, 5 figures, 4 tables, 1 algorithm.

Figures (5)

  • Figure 1: (a) KGRAG (Left). Basic KGRAG extracts key entities from user queries and searches for corresponding entities within KG, which are then fed into LLMs along with the query. (b) HyKGE (Right). HyKGE first queries LLMs to obtain hypothesis output and extracts entities from both the hypothesis output and the query. Then HyKGE retrieves reasoning chains between any two anchor entities and feeds the reasoning chains together with the query into LLMs.
  • Figure 2: The overall framework of HyKGE. HyKGE first feeds the user query ($\mathcal{Q}$) through the LLMs and obtains Hypothesis Output ($\mathcal{HO}$). Then through the NER Module, a W2NER model is applied to recognize entities and isolate relations. Through GTE Encoder, these recognized entities are then linked with entities in KGs. After that, HyKGE extracts three types of relevant reasoning chains from KGs. Then, because of the sparseness of $\mathcal{Q}$, in the HO Fragment Granularity-aware Rerank Module, HyKGE chunks $\mathcal{Q}$ and $\mathcal{HO}$ and align with reasoning chains via a TopK Chains Reranker, to eliminate irrelevant knowledge. Finally, we organize retrieved knowledge with the user query and obtain answers through LLM Reader.
  • Figure 3: The prompt formats of (Up.) Hypothesis Output Module and (Down.) LLM Reader.
  • Figure 4: Case study. We demonstrate the User Query $\mathcal{Q}$, Hypothesis Output $\mathcal{HO}$, Retrieved Reasoning Chains $\mathcal{RC}$ and Pruned Reasoning chains $\mathcal{RC}_{\texttt{prune}}$ of HyKGE using GPT-3.5 Turbo to verify the interpretability and effectiveness of HyKGE. Red shed signifies that the knowledge or answer is derived from evidence in $\mathcal{Q}$, blue shed indicates that the evidence originates from $\mathcal{HO}$, and green shed represents a corrected answer with the help of $\mathcal{KG}$, despite initially being false in $\mathcal{HO}$.
  • Figure 5: (Left.) Hyper-parameter study with the KG hop $k$ on MMCU-Medical and CMB-Exam with GPT 3.5 turbo, from 1 to 5. (Right.) Hyper-parameter study with the reranker $topK$ on MMCU-Medical and CMB-Exam with GPT 3.5 turbo, from 5 to 50.

Theorems & Definitions (2)

  • definition 1: Knowledge Graph
  • definition 2: Knowledge Graph Retrieval