ConfRAG: Confidence-Guided Retrieval-Augmenting Generation
Yin Huang, Yifan Ethan Xu, Kai Sun, Vera Yan, Alicia Sun, Haidar Khan, Jimmy Nguyen, Jingxiang Chen, Mohammad Kachuee, Zhaojiang Lin, Yue Liu, Aaron Colak, Anuj Kumar, Wen-tau Yih, Xin Luna Dong
TL;DR
ConfRAG addresses factual hallucinations and retrieval costs by training a confidence-calibrated QA module (ConfQA) that answers only when certain, and defers to RAG when unsure. This unsureness signal then triggers a retrieval Augmented pipeline (ConfRAG), achieving high accuracy in ideal conditions while substantially reducing unnecessary external retrievals. Across seven benchmarks, ConfQA reduces hallucinations to below 5% and ConfRAG delivers strong triggering performance, with real-RAG deployments yielding competitive accuracy and meaningful latency savings. The work formalizes a practical framework for integrating calibrated internal knowledge with external sources to improve factuality and efficiency in real-world AI systems.
Abstract
Can Large Language Models (LLMs) be trained to avoid hallucinating factual statements, and can Retrieval-Augmented Generation (RAG) be triggered only when necessary to reduce retrieval and computation costs? In this work, we address both challenges simultaneously. We introduce ConfQA, a fine-tuning strategy that reduces hallucination rates from 20-40% to below 5% across multiple factuality benchmarks. The approach is simple: when the model answers correctly, it is trained to output the answer; otherwise, it is trained to respond with "I am unsure". Two design choices make this training effective: (1) a dampening prompt ("answer only if you are confident") that explicitly discourages overconfident hallucinations, and (2) training data drawn from atomic factual statements (e.g., knowledge graph attribute values), which calibrates model confidence and yields robust generalization across domains and question types. Building on ConfQA, we propose ConfRAG, a triggering strategy that invokes RAG only when the model responses with unsure. This framework achieves accuracy above 95% in ideal case while reducing unnecessary external retrievals by over 30%.
