Embedding-Informed Adaptive Retrieval-Augmented Generation of Large Language Models
Chengkai Huang, Yu Xia, Rui Wang, Kaige Xie, Tong Yu, Julian McAuley, Lina Yao
TL;DR
This paper tackles the problem of when to augment large language models with external retrieval in adaptive retrieval-augmented generation (ARAG). It introduces Embedding-Informed ARAG (EI-ARAG), which uses pre-trained token embeddings from the first contextualized layer to predict whether external knowledge is necessary, avoiding access to pre-training data or additional LLM inferences. A lightweight 3-layer MLP classifier is trained on ${\textsf{embed}}_{\text{1st}}(\text{T}(q))$ and a binary label, enabling test-time retrieval decisions that balance accuracy and efficiency. Across PopQA and TriviaQA, EI-ARAG achieves higher accuracy with lower retrieval rates and substantially reduced latency compared to prior adaptive methods, demonstrating a practical, embedding-driven mechanism for efficient ARAG deployment.
Abstract
Retrieval-augmented large language models (LLMs) have been remarkably competent in various NLP tasks. However, it was observed by previous works that retrieval is not always helpful, especially when the LLM is already knowledgeable on the query to answer. Motivated by this, Adaptive Retrieval-Augmented Generation (ARAG) studies retrieving only when the knowledge asked by the query is absent in the LLM. Previous works of ARAG either require accessing the pre-training corpus or prompting with additional model inferences. Aiming to avoid such drawbacks, we propose to determine whether the model is knowledgeable on a query via inspecting the (contextualized) pre-trained token embeddings of LLMs. We hypothesize that such embeddings capture rich information on the model's intrinsic knowledge base, which enables an efficient way of judging the necessity to retrieve from an external corpus. Extensive experiments demonstrate our ARAG approach's superior performance across various benchmarks.
