Table of Contents
Fetching ...

Unified Active Retrieval for Retrieval Augmented Generation

Qinyuan Cheng, Xiaonan Li, Shimin Li, Qin Zhu, Zhangyue Yin, Yunfan Shao, Linyang Li, Tianxiang Sun, Hang Yan, Xipeng Qiu

TL;DR

This work tackles the problem of when to apply Retrieval-Augmented Generation by introducing Unified Active Retrieval (UAR), which unifies four orthogonal criteria—intent, knowledge, time-sensitivity, and self-awareness—into lightweight plug-and-play binary classifiers attached to a fixed LLM. UAR-Criteria provides a standardized, multi-faceted decision tree that governs retrieval timing, enabling efficient and robust handling of diverse user instructions. Through AR-Bench and six downstream tasks, UAR consistently outperforms single-criterion baselines, avoids unnecessary retrieval, and effectively leverages retrieval when information is time-sensitive or unknown to the model. The approach offers practical benefits for real-world RAG systems by balancing retrieval utility with latency and preserving internal model capabilities.

Abstract

In Retrieval-Augmented Generation (RAG), retrieval is not always helpful and applying it to every instruction is sub-optimal. Therefore, determining whether to retrieve is crucial for RAG, which is usually referred to as Active Retrieval. However, existing active retrieval methods face two challenges: 1. They usually rely on a single criterion, which struggles with handling various types of instructions. 2. They depend on specialized and highly differentiated procedures, and thus combining them makes the RAG system more complicated and leads to higher response latency. To address these challenges, we propose Unified Active Retrieval (UAR). UAR contains four orthogonal criteria and casts them into plug-and-play classification tasks, which achieves multifaceted retrieval timing judgements with negligible extra inference cost. We further introduce the Unified Active Retrieval Criteria (UAR-Criteria), designed to process diverse active retrieval scenarios through a standardized procedure. Experiments on four representative types of user instructions show that UAR significantly outperforms existing work on the retrieval timing judgement and the performance of downstream tasks, which shows the effectiveness of UAR and its helpfulness to downstream tasks.

Unified Active Retrieval for Retrieval Augmented Generation

TL;DR

This work tackles the problem of when to apply Retrieval-Augmented Generation by introducing Unified Active Retrieval (UAR), which unifies four orthogonal criteria—intent, knowledge, time-sensitivity, and self-awareness—into lightweight plug-and-play binary classifiers attached to a fixed LLM. UAR-Criteria provides a standardized, multi-faceted decision tree that governs retrieval timing, enabling efficient and robust handling of diverse user instructions. Through AR-Bench and six downstream tasks, UAR consistently outperforms single-criterion baselines, avoids unnecessary retrieval, and effectively leverages retrieval when information is time-sensitive or unknown to the model. The approach offers practical benefits for real-world RAG systems by balancing retrieval utility with latency and preserving internal model capabilities.

Abstract

In Retrieval-Augmented Generation (RAG), retrieval is not always helpful and applying it to every instruction is sub-optimal. Therefore, determining whether to retrieve is crucial for RAG, which is usually referred to as Active Retrieval. However, existing active retrieval methods face two challenges: 1. They usually rely on a single criterion, which struggles with handling various types of instructions. 2. They depend on specialized and highly differentiated procedures, and thus combining them makes the RAG system more complicated and leads to higher response latency. To address these challenges, we propose Unified Active Retrieval (UAR). UAR contains four orthogonal criteria and casts them into plug-and-play classification tasks, which achieves multifaceted retrieval timing judgements with negligible extra inference cost. We further introduce the Unified Active Retrieval Criteria (UAR-Criteria), designed to process diverse active retrieval scenarios through a standardized procedure. Experiments on four representative types of user instructions show that UAR significantly outperforms existing work on the retrieval timing judgement and the performance of downstream tasks, which shows the effectiveness of UAR and its helpfulness to downstream tasks.
Paper Structure (40 sections, 3 figures, 6 tables)

This paper contains 40 sections, 3 figures, 6 tables.

Figures (3)

  • Figure 1: Different types of user instructions, which can not be handled by single active retrieval criteria.
  • Figure 2: Overview of the UAR framework. indicates that we freeze these parameters. indicates that we update these parameters. Each MLP is a fully connected layer, with an input dimension equal to the model's hidden state dimension and an output dimension of 2.
  • Figure 3: The impact of the number of reference documents on model performance.