SUGAR: Leveraging Contextual Confidence for Smarter Retrieval
Hanna Zubkova, Ji-Hoon Park, Seong-Whan Lee
TL;DR
This paper addresses inefficiency and noise in always-on retrieval for knowledge-intensive QA by introducing SUGAR, an adaptive retrieval framework that uses semantic entropy $SE$ to decide whether to answer from internal knowledge or fetch external context and to choose between single-step and multi-step retrieval via a threshold $\tau$. It demonstrates that semantic-entropy-guided retrieval improves accuracy and reduces the number of retrieval steps on both single-hop and multi-hop QA tasks, with a manageable latency overhead from computing $SE$. The approach requires no task-specific training and provides a robust, data-agnostic mechanism to balance internal and external knowledge in LLMs. Overall, SUGAR offers practical gains in QA performance and inference efficiency, with potential applicability to broader language understanding tasks where knowledge boundaries must be managed.
Abstract
Bearing in mind the limited parametric knowledge of Large Language Models (LLMs), retrieval-augmented generation (RAG) which supplies them with the relevant external knowledge has served as an approach to mitigate the issue of hallucinations to a certain extent. However, uniformly retrieving supporting context makes response generation source-inefficient, as triggering the retriever is not always necessary, or even inaccurate, when a model gets distracted by noisy retrieved content and produces an unhelpful answer. Motivated by these issues, we introduce Semantic Uncertainty Guided Adaptive Retrieval (SUGAR), where we leverage context-based entropy to actively decide whether to retrieve and to further determine between single-step and multi-step retrieval. Our empirical results show that selective retrieval guided by semantic uncertainty estimation improves the performance across diverse question answering tasks, as well as achieves a more efficient inference.
