Table of Contents
Fetching ...

SUGAR: Leveraging Contextual Confidence for Smarter Retrieval

Hanna Zubkova, Ji-Hoon Park, Seong-Whan Lee

TL;DR

This paper addresses inefficiency and noise in always-on retrieval for knowledge-intensive QA by introducing SUGAR, an adaptive retrieval framework that uses semantic entropy $SE$ to decide whether to answer from internal knowledge or fetch external context and to choose between single-step and multi-step retrieval via a threshold $\tau$. It demonstrates that semantic-entropy-guided retrieval improves accuracy and reduces the number of retrieval steps on both single-hop and multi-hop QA tasks, with a manageable latency overhead from computing $SE$. The approach requires no task-specific training and provides a robust, data-agnostic mechanism to balance internal and external knowledge in LLMs. Overall, SUGAR offers practical gains in QA performance and inference efficiency, with potential applicability to broader language understanding tasks where knowledge boundaries must be managed.

Abstract

Bearing in mind the limited parametric knowledge of Large Language Models (LLMs), retrieval-augmented generation (RAG) which supplies them with the relevant external knowledge has served as an approach to mitigate the issue of hallucinations to a certain extent. However, uniformly retrieving supporting context makes response generation source-inefficient, as triggering the retriever is not always necessary, or even inaccurate, when a model gets distracted by noisy retrieved content and produces an unhelpful answer. Motivated by these issues, we introduce Semantic Uncertainty Guided Adaptive Retrieval (SUGAR), where we leverage context-based entropy to actively decide whether to retrieve and to further determine between single-step and multi-step retrieval. Our empirical results show that selective retrieval guided by semantic uncertainty estimation improves the performance across diverse question answering tasks, as well as achieves a more efficient inference.

SUGAR: Leveraging Contextual Confidence for Smarter Retrieval

TL;DR

This paper addresses inefficiency and noise in always-on retrieval for knowledge-intensive QA by introducing SUGAR, an adaptive retrieval framework that uses semantic entropy to decide whether to answer from internal knowledge or fetch external context and to choose between single-step and multi-step retrieval via a threshold . It demonstrates that semantic-entropy-guided retrieval improves accuracy and reduces the number of retrieval steps on both single-hop and multi-hop QA tasks, with a manageable latency overhead from computing . The approach requires no task-specific training and provides a robust, data-agnostic mechanism to balance internal and external knowledge in LLMs. Overall, SUGAR offers practical gains in QA performance and inference efficiency, with potential applicability to broader language understanding tasks where knowledge boundaries must be managed.

Abstract

Bearing in mind the limited parametric knowledge of Large Language Models (LLMs), retrieval-augmented generation (RAG) which supplies them with the relevant external knowledge has served as an approach to mitigate the issue of hallucinations to a certain extent. However, uniformly retrieving supporting context makes response generation source-inefficient, as triggering the retriever is not always necessary, or even inaccurate, when a model gets distracted by noisy retrieved content and produces an unhelpful answer. Motivated by these issues, we introduce Semantic Uncertainty Guided Adaptive Retrieval (SUGAR), where we leverage context-based entropy to actively decide whether to retrieve and to further determine between single-step and multi-step retrieval. Our empirical results show that selective retrieval guided by semantic uncertainty estimation improves the performance across diverse question answering tasks, as well as achieves a more efficient inference.
Paper Structure (11 sections, 1 equation, 2 figures, 3 tables)

This paper contains 11 sections, 1 equation, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Overview of the proposed retrieval strategy. Semantic entropy is used to measure how confident the model is to answer the question based on its parametric knowledge. (A) If semantic entropy is low, the answer is generated based on internal knowledge, (B) if semantic entropy is high, the retriever is triggered to find relevant external knowledge, which is used to generate the answer.
  • Figure 2: Semantic entropy levels and corresponding accuracy. Gradient indicates retrieval frequency (as the color fades out, retrieval is triggered less), we mark the semantic entropy levels used as thresholds in red.