Table of Contents
Fetching ...

Hyper-RAG: Combating LLM Hallucinations using Hypergraph-Driven Retrieval-Augmented Generation

Yifan Feng, Hao Hu, Xingliang Hou, Shiquan Liu, Shihui Ying, Shaoyi Du, Han Hu, Yue Gao

TL;DR

Hyper-RAG introduces a hypergraph-driven Retrieval-Augmented Generation framework to mitigate LLM hallucinations in high-stakes domains by representing domain knowledge with both low-order and high-order correlations. It builds a hypergraph-based knowledge base offline from domain corpora and couples it with a vector store to support retrieval-augmented generation, enabling robust open-domain QA across medical and diverse domains. Empirical results on NeurologyCrop and nine other corpora show substantial accuracy gains over direct LLM usage and conventional Graph/Light RAG methods, with robustness to nested questioning and improved diversity and coherence. The work also provides an efficiency-conscious Hyper-RAG-Lite variant and a comprehensive evaluation methodology combining scoring-based and selection-based assessments to quantify hallucination reduction and answer reliability.

Abstract

Large language models (LLMs) have transformed various sectors, including education, finance, and medicine, by enhancing content generation and decision-making processes. However, their integration into the medical field is cautious due to hallucinations, instances where generated content deviates from factual accuracy, potentially leading to adverse outcomes. To address this, we introduce Hyper-RAG, a hypergraph-driven Retrieval-Augmented Generation method that comprehensively captures both pairwise and beyond-pairwise correlations in domain-specific knowledge, thereby mitigating hallucinations. Experiments on the NeurologyCrop dataset with six prominent LLMs demonstrated that Hyper-RAG improves accuracy by an average of 12.3% over direct LLM use and outperforms Graph RAG and Light RAG by 6.3% and 6.0%, respectively. Additionally, Hyper-RAG maintained stable performance with increasing query complexity, unlike existing methods which declined. Further validation across nine diverse datasets showed a 35.5% performance improvement over Light RAG using a selection-based assessment. The lightweight variant, Hyper-RAG-Lite, achieved twice the retrieval speed and a 3.3% performance boost compared with Light RAG. These results confirm Hyper-RAG's effectiveness in enhancing LLM reliability and reducing hallucinations, making it a robust solution for high-stakes applications like medical diagnostics.

Hyper-RAG: Combating LLM Hallucinations using Hypergraph-Driven Retrieval-Augmented Generation

TL;DR

Hyper-RAG introduces a hypergraph-driven Retrieval-Augmented Generation framework to mitigate LLM hallucinations in high-stakes domains by representing domain knowledge with both low-order and high-order correlations. It builds a hypergraph-based knowledge base offline from domain corpora and couples it with a vector store to support retrieval-augmented generation, enabling robust open-domain QA across medical and diverse domains. Empirical results on NeurologyCrop and nine other corpora show substantial accuracy gains over direct LLM usage and conventional Graph/Light RAG methods, with robustness to nested questioning and improved diversity and coherence. The work also provides an efficiency-conscious Hyper-RAG-Lite variant and a comprehensive evaluation methodology combining scoring-based and selection-based assessments to quantify hallucination reduction and answer reliability.

Abstract

Large language models (LLMs) have transformed various sectors, including education, finance, and medicine, by enhancing content generation and decision-making processes. However, their integration into the medical field is cautious due to hallucinations, instances where generated content deviates from factual accuracy, potentially leading to adverse outcomes. To address this, we introduce Hyper-RAG, a hypergraph-driven Retrieval-Augmented Generation method that comprehensively captures both pairwise and beyond-pairwise correlations in domain-specific knowledge, thereby mitigating hallucinations. Experiments on the NeurologyCrop dataset with six prominent LLMs demonstrated that Hyper-RAG improves accuracy by an average of 12.3% over direct LLM use and outperforms Graph RAG and Light RAG by 6.3% and 6.0%, respectively. Additionally, Hyper-RAG maintained stable performance with increasing query complexity, unlike existing methods which declined. Further validation across nine diverse datasets showed a 35.5% performance improvement over Light RAG using a selection-based assessment. The lightweight variant, Hyper-RAG-Lite, achieved twice the retrieval speed and a 3.3% performance boost compared with Light RAG. These results confirm Hyper-RAG's effectiveness in enhancing LLM reliability and reducing hallucinations, making it a robust solution for high-stakes applications like medical diagnostics.

Paper Structure

This paper contains 22 sections, 7 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Illustration of Complex Correlation Modeling in Data.a, The real-world entity space, depicting the various entities present in the dataset. b, Potential complex correlations among these entities, including low-order correlations such as pairwise correlations or self-relations, and high-order correlations involving interactions among three or more entities. c, Visualization of entity correlations using circles to represent correlations between entities. The structure is modeled as a 2-uniform hypergraph, emphasizing pairwise connections. Another example illustrates correlations among three and four entities, with circles encompassing three and four entities, respectively.
  • Figure 2: Illustration of Entity and Correlation Extraction from Raw Corpus: Dark brown boxes represent entities, blue arrows denote low-order correlations between entities, and red arrows indicate high-order correlations. Yellow boxes contain the original descriptions of the respective entities or their correlations.
  • Figure 3: Results of Integrating Hyper-RAG with Different Large Language Models. Each LLM displayed on the x-axis represents the respective base model as indicated by its label. The other RAG methods shown are enhancements built upon these base models. The evaluation scores are calculated as the average of five scoring-based assessment metrics. The results demonstrate that Hyper-RAG consistently improves performance by an average of 12.3% across six LLMs, highlighting its effectiveness in enhancing model capabilities through integration with LLMs.
  • Figure 4: Detailed results of integrating Hyper-RAG with various LLMs.
  • Figure 5: Experimental results of questions with different difficulty. The first subplot summarizes the experimental results across three different difficulties, with each score representing the average of five dimension-based assessments. The subsequent three subplots display the response quality scores across five dimensions for different methods, each targeting a specific difficulty. The x-axis displays six evaluation metrics: Comp. (Comprehensiveness), Dive. (Diversity), Empo. (Empowerment), Logi. (Logical), Read. (Readability), and Overall (the average of these five metrics).
  • ...and 3 more figures