Table of Contents
Fetching ...

Do Retrieval Augmented Language Models Know When They Don't Know?

Youchao Zhou, Heyan Huang, Yicheng Liu, Rui Dai, Xinglin Wang, Xingchen Zhang, Shumin Shi, Yang Deng

TL;DR

This paper investigates whether retrieval-augmented language models (RALMs) know when they do not know by examining uncertainty estimation and refusal behavior across internal and external knowledge states. It evaluates three uncertainty estimation families—verbalization-based, consistency-based, and similarity-matrix based—alongside two refusal post-training methods (R-tuning and in-context fine-tuning, ICFT) to understand calibration, over-refusal, and the interaction between refusals and retrieved evidence. The study reveals that external context strongly shapes calibration and that over-refusal emerges with negative contexts, although ICFT can mitigate this tendency; however, improved refusal does not always align with better calibration or accuracy. To address over-refusal, the authors propose a two-stage, knowledge-state aware post-refusal approach that balances refusing with providing correct answers, achieving higher overall output quality. These findings highlight the need for more robust uncertainty estimation and explicit modeling of dynamic knowledge when designing refusal mechanisms for RALMs.

Abstract

Existing large language models (LLMs) occasionally generate plausible yet factually incorrect responses, known as hallucinations. Two main approaches have been proposed to mitigate hallucinations: retrieval-augmented language models (RALMs) and refusal post-training. However, current research predominantly focuses on their individual effectiveness while overlooking the evaluation of the refusal capability of RALMs. Ideally, if RALMs know when they do not know, they should refuse to answer.In this study, we ask the fundamental question: Do RALMs know when they don't know? Specifically, we investigate three questions. First, are RALMs well calibrated with respect to different internal and external knowledge states? We examine the influence of various factors. Contrary to expectations, when all retrieved documents are irrelevant, RALMs still tend to refuse questions they could have answered correctly. Next, given the model's pronounced \textbf{over-refusal} behavior, we raise a second question: How does a RALM's refusal ability align with its calibration quality? Our results show that the over-refusal problem can be mitigated through in-context fine-tuning. However, we observe that improved refusal behavior does not necessarily imply better calibration or higher overall accuracy. Finally, we ask: Can we combine refusal-aware RALMs with uncertainty-based answer abstention to mitigate over-refusal? We develop a simple yet effective refusal mechanism for refusal-post-trained RALMs that improves their overall answer quality by balancing refusal and correct answers. Our study provides a more comprehensive understanding of the factors influencing RALM behavior. Meanwhile, we emphasize that uncertainty estimation for RALMs remains an open problem deserving deeper investigation.

Do Retrieval Augmented Language Models Know When They Don't Know?

TL;DR

This paper investigates whether retrieval-augmented language models (RALMs) know when they do not know by examining uncertainty estimation and refusal behavior across internal and external knowledge states. It evaluates three uncertainty estimation families—verbalization-based, consistency-based, and similarity-matrix based—alongside two refusal post-training methods (R-tuning and in-context fine-tuning, ICFT) to understand calibration, over-refusal, and the interaction between refusals and retrieved evidence. The study reveals that external context strongly shapes calibration and that over-refusal emerges with negative contexts, although ICFT can mitigate this tendency; however, improved refusal does not always align with better calibration or accuracy. To address over-refusal, the authors propose a two-stage, knowledge-state aware post-refusal approach that balances refusing with providing correct answers, achieving higher overall output quality. These findings highlight the need for more robust uncertainty estimation and explicit modeling of dynamic knowledge when designing refusal mechanisms for RALMs.

Abstract

Existing large language models (LLMs) occasionally generate plausible yet factually incorrect responses, known as hallucinations. Two main approaches have been proposed to mitigate hallucinations: retrieval-augmented language models (RALMs) and refusal post-training. However, current research predominantly focuses on their individual effectiveness while overlooking the evaluation of the refusal capability of RALMs. Ideally, if RALMs know when they do not know, they should refuse to answer.In this study, we ask the fundamental question: Do RALMs know when they don't know? Specifically, we investigate three questions. First, are RALMs well calibrated with respect to different internal and external knowledge states? We examine the influence of various factors. Contrary to expectations, when all retrieved documents are irrelevant, RALMs still tend to refuse questions they could have answered correctly. Next, given the model's pronounced \textbf{over-refusal} behavior, we raise a second question: How does a RALM's refusal ability align with its calibration quality? Our results show that the over-refusal problem can be mitigated through in-context fine-tuning. However, we observe that improved refusal behavior does not necessarily imply better calibration or higher overall accuracy. Finally, we ask: Can we combine refusal-aware RALMs with uncertainty-based answer abstention to mitigate over-refusal? We develop a simple yet effective refusal mechanism for refusal-post-trained RALMs that improves their overall answer quality by balancing refusal and correct answers. Our study provides a more comprehensive understanding of the factors influencing RALM behavior. Meanwhile, we emphasize that uncertainty estimation for RALMs remains an open problem deserving deeper investigation.

Paper Structure

This paper contains 28 sections, 4 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: An illustration of the knowledge boundary of a RALM and the corresponding answer correctness. We divide the knowledge state into four quadrants based on the model’s internal knowledge and the knowledge provided by external context. The question at the gray dot lies outside the model’s knowledge boundary, whereas the question at the blue dot lies within it. However, given irrelevant context, the model may still refuse to answer the blue-dot question.
  • Figure 2: Refusal and answer confusion matrix. “Should answer/refuse” is the ground truth label while “answer correct/incorrect”, refuse is the response situation.
  • Figure 3: The reliability diagram under different internal and external knowledge states. The blue bar is the precision questions. The pink bar indicates the over-confident gap, and the purple bar indicates the under-confident gap.
  • Figure 4: The answer precision (denoted as "accuracy") and refusal rate vary according to the internal/external knowledge states. The whole negative context (0 pos) leads to significant decrease of accuracy and increase of refusal on “highlyknown” questions.