Table of Contents
Fetching ...

Embedding-Informed Adaptive Retrieval-Augmented Generation of Large Language Models

Chengkai Huang, Yu Xia, Rui Wang, Kaige Xie, Tong Yu, Julian McAuley, Lina Yao

TL;DR

This paper tackles the problem of when to augment large language models with external retrieval in adaptive retrieval-augmented generation (ARAG). It introduces Embedding-Informed ARAG (EI-ARAG), which uses pre-trained token embeddings from the first contextualized layer to predict whether external knowledge is necessary, avoiding access to pre-training data or additional LLM inferences. A lightweight 3-layer MLP classifier is trained on ${\textsf{embed}}_{\text{1st}}(\text{T}(q))$ and a binary label, enabling test-time retrieval decisions that balance accuracy and efficiency. Across PopQA and TriviaQA, EI-ARAG achieves higher accuracy with lower retrieval rates and substantially reduced latency compared to prior adaptive methods, demonstrating a practical, embedding-driven mechanism for efficient ARAG deployment.

Abstract

Retrieval-augmented large language models (LLMs) have been remarkably competent in various NLP tasks. However, it was observed by previous works that retrieval is not always helpful, especially when the LLM is already knowledgeable on the query to answer. Motivated by this, Adaptive Retrieval-Augmented Generation (ARAG) studies retrieving only when the knowledge asked by the query is absent in the LLM. Previous works of ARAG either require accessing the pre-training corpus or prompting with additional model inferences. Aiming to avoid such drawbacks, we propose to determine whether the model is knowledgeable on a query via inspecting the (contextualized) pre-trained token embeddings of LLMs. We hypothesize that such embeddings capture rich information on the model's intrinsic knowledge base, which enables an efficient way of judging the necessity to retrieve from an external corpus. Extensive experiments demonstrate our ARAG approach's superior performance across various benchmarks.

Embedding-Informed Adaptive Retrieval-Augmented Generation of Large Language Models

TL;DR

This paper tackles the problem of when to augment large language models with external retrieval in adaptive retrieval-augmented generation (ARAG). It introduces Embedding-Informed ARAG (EI-ARAG), which uses pre-trained token embeddings from the first contextualized layer to predict whether external knowledge is necessary, avoiding access to pre-training data or additional LLM inferences. A lightweight 3-layer MLP classifier is trained on and a binary label, enabling test-time retrieval decisions that balance accuracy and efficiency. Across PopQA and TriviaQA, EI-ARAG achieves higher accuracy with lower retrieval rates and substantially reduced latency compared to prior adaptive methods, demonstrating a practical, embedding-driven mechanism for efficient ARAG deployment.

Abstract

Retrieval-augmented large language models (LLMs) have been remarkably competent in various NLP tasks. However, it was observed by previous works that retrieval is not always helpful, especially when the LLM is already knowledgeable on the query to answer. Motivated by this, Adaptive Retrieval-Augmented Generation (ARAG) studies retrieving only when the knowledge asked by the query is absent in the LLM. Previous works of ARAG either require accessing the pre-training corpus or prompting with additional model inferences. Aiming to avoid such drawbacks, we propose to determine whether the model is knowledgeable on a query via inspecting the (contextualized) pre-trained token embeddings of LLMs. We hypothesize that such embeddings capture rich information on the model's intrinsic knowledge base, which enables an efficient way of judging the necessity to retrieve from an external corpus. Extensive experiments demonstrate our ARAG approach's superior performance across various benchmarks.
Paper Structure (20 sections, 1 equation, 3 figures, 6 tables)

This paper contains 20 sections, 1 equation, 3 figures, 6 tables.

Figures (3)

  • Figure 1: Visualization of embeddings at different layers of LLaMA 2 7B for director-related questions in PopQA. Darker color indicates the question contains entities with higher frequency in pre-training data.
  • Figure 2: Per-relationship type results on PopQA by different models, showing overall accuracy of EI-ARAG using 0th and 1st layer embeddings based on BM25 RALM.
  • Figure 3: Sankey Diagram for our EI-ARAG method on the TriviaQA dataset.