Table of Contents
Fetching ...

Probing-RAG: Self-Probing to Guide Language Models in Selective Document Retrieval

Ingeol Baek, Hwan Chang, Byeongjeong Kim, Jimin Lee, Hwanhee Lee

TL;DR

Probing-RAG tackles the challenge of when to retrieve external knowledge by tapping into the language model’s internal hidden states with a lightweight prober. The prober, trained on synthetic open-domain QA data, predicts whether additional retrieval will improve answer quality, enabling adaptive retrieval that avoids unnecessary external information. Across five open-domain QA datasets, Probing-RAG outperforms prior adaptive retrieval methods while reducing retrieval calls by about 50% on average, and demonstrates robust consistency and strong correlation between prober accuracy and QA performance. This approach offers a practical, efficient mechanism to mitigate external-knowledge conflicts and hallucinations by balancing internal model knowledge with retrieved content. The findings highlight the potential of internal-state-guided retrieval to enhance real-world RAG systems, with limitations related to access to hidden states and training data requirements.

Abstract

Retrieval-Augmented Generation (RAG) enhances language models by retrieving and incorporating relevant external knowledge. However, traditional retrieve-and-generate processes may not be optimized for real-world scenarios, where queries might require multiple retrieval steps or none at all. In this paper, we propose a Probing-RAG, which utilizes the hidden state representations from the intermediate layers of language models to adaptively determine the necessity of additional retrievals for a given query. By employing a pre-trained prober, Probing-RAG effectively captures the model's internal cognition, enabling reliable decision-making about retrieving external documents. Experimental results across five open-domain QA datasets demonstrate that Probing-RAG outperforms previous methods while reducing the number of redundant retrieval steps.

Probing-RAG: Self-Probing to Guide Language Models in Selective Document Retrieval

TL;DR

Probing-RAG tackles the challenge of when to retrieve external knowledge by tapping into the language model’s internal hidden states with a lightweight prober. The prober, trained on synthetic open-domain QA data, predicts whether additional retrieval will improve answer quality, enabling adaptive retrieval that avoids unnecessary external information. Across five open-domain QA datasets, Probing-RAG outperforms prior adaptive retrieval methods while reducing retrieval calls by about 50% on average, and demonstrates robust consistency and strong correlation between prober accuracy and QA performance. This approach offers a practical, efficient mechanism to mitigate external-knowledge conflicts and hallucinations by balancing internal model knowledge with retrieved content. The findings highlight the potential of internal-state-guided retrieval to enhance real-world RAG systems, with limitations related to access to hidden states and training data requirements.

Abstract

Retrieval-Augmented Generation (RAG) enhances language models by retrieving and incorporating relevant external knowledge. However, traditional retrieve-and-generate processes may not be optimized for real-world scenarios, where queries might require multiple retrieval steps or none at all. In this paper, we propose a Probing-RAG, which utilizes the hidden state representations from the intermediate layers of language models to adaptively determine the necessity of additional retrievals for a given query. By employing a pre-trained prober, Probing-RAG effectively captures the model's internal cognition, enabling reliable decision-making about retrieving external documents. Experimental results across five open-domain QA datasets demonstrate that Probing-RAG outperforms previous methods while reducing the number of redundant retrieval steps.

Paper Structure

This paper contains 33 sections, 6 equations, 6 figures, 13 tables, 1 algorithm.

Figures (6)

  • Figure 1: The left example illustrates how redundant retrieval steps, guided by an external query complexity classifier that does not reflect the LLM's internal knowledge, can lead to wrong answers. In contrast, the right example shows that the model uses the prober to recognize that no further retrieval is needed, allowing it to generate the correct answer.
  • Figure 2: A conceptual comparison of various Adaptive-RAG approaches. (A) determines whether to perform retrieval based on query complexity measured by an external classifier. (B) decides retrieval based on the response from the LLM. (C) uses the confidence of the final token selection to determine retrieval. (D) Our proposed Probing-RAG decides retrieval using a prober model, which utilizes the internal hidden states of the LLM.
  • Figure 3: Examples of prober training dataset.
  • Figure 4: (Left) Probing accuracy measured for each model among the layer, (Right) Correlation between prober's classification performance and QA performance, using the Gemma-2b model.
  • Figure 5: Kernel density estimate plot of logits using the Gemma-2b model, where orange indicates retrieval calls and blue indicates no retrieval needed. Marginal distributions are shown on the top and right. The results are projected onto the 10th and 12th residual post positions.
  • ...and 1 more figures