HalluGuard: Evidence-Grounded Small Reasoning Models to Mitigate Hallucinations in Retrieval-Augmented Generation
Loris Bergeron, Ioana Buhnila, Jérôme François, Radu State
TL;DR
HalluGuard introduces a 4B-parameter Small Reasoning Model (SRM) designed to mitigate hallucinations in Retrieval-Augmented Generation by producing evidence-grounded justifications. The approach combines a domain-agnostic synthetic data pipeline (HalluClaim) with multi-stage curation, prompt-guided data reformulation, and a two-generator preference setup (ORPO-enabled LoRA fine-tuning of a Qwen-4B backbone) to distill large-model reasoning into a compact, scalable solution. On the LLM-AggreFact benchmark, HalluGuard-4B achieves ${BAcc}=75.7\%$, rivaling larger models, and on RAGTruth reaches ${BAcc}=84.0\%$ with high true negative and true positive rates, demonstrating strong grounding and explainability. Ablation studies highlight the critical roles of reasoning traces, consensus filtering, and preference alignment in boosting performance, all while providing interpretable, evidence-backed justifications suitable for enterprise RAG deployments. The work also emphasizes reproducibility and plans for open-release of HalluGuard and related datasets under Apache 2.0, aiming to advance trustworthy, transparent retrieval-augmented systems.
Abstract
Large Language Models (LLMs) excel in many NLP tasks but remain prone to hallucinations, limiting trust in real-world applications. We present HalluGuard, a 4B-parameter Small Reasoning Model (SRM) for mitigating hallucinations in Retrieval-Augmented Generation (RAG). HalluGuard classifies document-claim pairs as grounded or hallucinated and produces evidence-grounded justifications for transparency. Our approach combines (i) a domain-agnostic synthetic dataset derived from FineWeb and refined through multi-stage curation and data reformation, (ii) synthetic grounded and hallucinated claims, and (iii) preference-based fine-tuning with Odds Ratio Preference Optimization to distill large-model reasoning into a smaller backbone. On the RAGTruth subset of the LLM-AggreFact benchmark, HalluGuard achieves 84.0% balanced accuracy (BAcc), rivaling specialized models, MiniCheck (7B; 84.0%) and Granite Guardian 3.3 (8B; 82.2%) while using roughly half their parameters. Over the full benchmark it reaches 75.7% BAcc, matching larger general-purpose LLMs such as GPT-4o (75.9%). We will release HalluGuard and datasets under Apache 2.0 upon acceptance.
