Table of Contents
Fetching ...

To Retrieve or Not to Retrieve? Uncertainty Detection for Dynamic Retrieval Augmented Generation

Kaustubh D. Dhole

TL;DR

This work investigates when to call external retrieval in Retrieval-Augmented Generation for long-form, multi-hop QA. It introduces a dynamic retrieval framework driven by uncertainty detection, including a forward-looking future-sentence uncertainty test, multiple sequence-level uncertainty measures, and a subquery generator to acquire missing information. Across 2WikiMultihopQA, uncertainty-based triggers—especially the Eccentricity metric—substantially cut retrieval calls (≈50%) with only a modest drop in QA accuracy, demonstrating practical efficiency gains. The study highlights both promising uncertainty estimators and trade-offs, offering actionable guidance for deploying cost-efficient RAG systems in real-world tasks.

Abstract

Retrieval-Augmented Generation equips large language models with the capability to retrieve external knowledge, thereby mitigating hallucinations by incorporating information beyond the model's intrinsic abilities. However, most prior works have focused on invoking retrieval deterministically, which makes it unsuitable for tasks such as long-form question answering. Instead, dynamically performing retrieval by invoking it only when the underlying LLM lacks the required knowledge can be more efficient. In this context, we delve deeper into the question, "To Retrieve or Not to Retrieve?" by exploring multiple uncertainty detection methods. We evaluate these methods for the task of long-form question answering, employing dynamic retrieval, and present our comparisons. Our findings suggest that uncertainty detection metrics, such as Degree Matrix Jaccard and Eccentricity, can reduce the number of retrieval calls by almost half, with only a slight reduction in question-answering accuracy.

To Retrieve or Not to Retrieve? Uncertainty Detection for Dynamic Retrieval Augmented Generation

TL;DR

This work investigates when to call external retrieval in Retrieval-Augmented Generation for long-form, multi-hop QA. It introduces a dynamic retrieval framework driven by uncertainty detection, including a forward-looking future-sentence uncertainty test, multiple sequence-level uncertainty measures, and a subquery generator to acquire missing information. Across 2WikiMultihopQA, uncertainty-based triggers—especially the Eccentricity metric—substantially cut retrieval calls (≈50%) with only a modest drop in QA accuracy, demonstrating practical efficiency gains. The study highlights both promising uncertainty estimators and trade-offs, offering actionable guidance for deploying cost-efficient RAG systems in real-world tasks.

Abstract

Retrieval-Augmented Generation equips large language models with the capability to retrieve external knowledge, thereby mitigating hallucinations by incorporating information beyond the model's intrinsic abilities. However, most prior works have focused on invoking retrieval deterministically, which makes it unsuitable for tasks such as long-form question answering. Instead, dynamically performing retrieval by invoking it only when the underlying LLM lacks the required knowledge can be more efficient. In this context, we delve deeper into the question, "To Retrieve or Not to Retrieve?" by exploring multiple uncertainty detection methods. We evaluate these methods for the task of long-form question answering, employing dynamic retrieval, and present our comparisons. Our findings suggest that uncertainty detection metrics, such as Degree Matrix Jaccard and Eccentricity, can reduce the number of retrieval calls by almost half, with only a slight reduction in question-answering accuracy.
Paper Structure (11 sections, 3 equations, 2 tables)