Hyper-RAG: Combating LLM Hallucinations using Hypergraph-Driven Retrieval-Augmented Generation
Yifan Feng, Hao Hu, Xingliang Hou, Shiquan Liu, Shihui Ying, Shaoyi Du, Han Hu, Yue Gao
TL;DR
Hyper-RAG introduces a hypergraph-driven Retrieval-Augmented Generation framework to mitigate LLM hallucinations in high-stakes domains by representing domain knowledge with both low-order and high-order correlations. It builds a hypergraph-based knowledge base offline from domain corpora and couples it with a vector store to support retrieval-augmented generation, enabling robust open-domain QA across medical and diverse domains. Empirical results on NeurologyCrop and nine other corpora show substantial accuracy gains over direct LLM usage and conventional Graph/Light RAG methods, with robustness to nested questioning and improved diversity and coherence. The work also provides an efficiency-conscious Hyper-RAG-Lite variant and a comprehensive evaluation methodology combining scoring-based and selection-based assessments to quantify hallucination reduction and answer reliability.
Abstract
Large language models (LLMs) have transformed various sectors, including education, finance, and medicine, by enhancing content generation and decision-making processes. However, their integration into the medical field is cautious due to hallucinations, instances where generated content deviates from factual accuracy, potentially leading to adverse outcomes. To address this, we introduce Hyper-RAG, a hypergraph-driven Retrieval-Augmented Generation method that comprehensively captures both pairwise and beyond-pairwise correlations in domain-specific knowledge, thereby mitigating hallucinations. Experiments on the NeurologyCrop dataset with six prominent LLMs demonstrated that Hyper-RAG improves accuracy by an average of 12.3% over direct LLM use and outperforms Graph RAG and Light RAG by 6.3% and 6.0%, respectively. Additionally, Hyper-RAG maintained stable performance with increasing query complexity, unlike existing methods which declined. Further validation across nine diverse datasets showed a 35.5% performance improvement over Light RAG using a selection-based assessment. The lightweight variant, Hyper-RAG-Lite, achieved twice the retrieval speed and a 3.3% performance boost compared with Light RAG. These results confirm Hyper-RAG's effectiveness in enhancing LLM reliability and reducing hallucinations, making it a robust solution for high-stakes applications like medical diagnostics.
