Unified Active Retrieval for Retrieval Augmented Generation
Qinyuan Cheng, Xiaonan Li, Shimin Li, Qin Zhu, Zhangyue Yin, Yunfan Shao, Linyang Li, Tianxiang Sun, Hang Yan, Xipeng Qiu
TL;DR
This work tackles the problem of when to apply Retrieval-Augmented Generation by introducing Unified Active Retrieval (UAR), which unifies four orthogonal criteria—intent, knowledge, time-sensitivity, and self-awareness—into lightweight plug-and-play binary classifiers attached to a fixed LLM. UAR-Criteria provides a standardized, multi-faceted decision tree that governs retrieval timing, enabling efficient and robust handling of diverse user instructions. Through AR-Bench and six downstream tasks, UAR consistently outperforms single-criterion baselines, avoids unnecessary retrieval, and effectively leverages retrieval when information is time-sensitive or unknown to the model. The approach offers practical benefits for real-world RAG systems by balancing retrieval utility with latency and preserving internal model capabilities.
Abstract
In Retrieval-Augmented Generation (RAG), retrieval is not always helpful and applying it to every instruction is sub-optimal. Therefore, determining whether to retrieve is crucial for RAG, which is usually referred to as Active Retrieval. However, existing active retrieval methods face two challenges: 1. They usually rely on a single criterion, which struggles with handling various types of instructions. 2. They depend on specialized and highly differentiated procedures, and thus combining them makes the RAG system more complicated and leads to higher response latency. To address these challenges, we propose Unified Active Retrieval (UAR). UAR contains four orthogonal criteria and casts them into plug-and-play classification tasks, which achieves multifaceted retrieval timing judgements with negligible extra inference cost. We further introduce the Unified Active Retrieval Criteria (UAR-Criteria), designed to process diverse active retrieval scenarios through a standardized procedure. Experiments on four representative types of user instructions show that UAR significantly outperforms existing work on the retrieval timing judgement and the performance of downstream tasks, which shows the effectiveness of UAR and its helpfulness to downstream tasks.
