Self-Routing RAG: Binding Selective Retrieval with Knowledge Verbalization

Di Wu; Jia-Chen Gu; Kai-Wei Chang; Nanyun Peng

Self-Routing RAG: Binding Selective Retrieval with Knowledge Verbalization

Di Wu, Jia-Chen Gu, Kai-Wei Chang, Nanyun Peng

TL;DR

SR-RAG addresses the gap where selective retrieval underutilizes the LLM's internal knowledge by introducing knowledge verbalization as a core component. It reframes retrieval as knowledge source selection between external retrieval and internal verbalization, and trains the model with a two-stage objective that couples source selection, verbalization, and answer generation, aided by a nearest-neighbor policy for robust inference under domain shifts. Empirically, SR-RAG outperforms always-retrieve and prior selective retrieval baselines across multiple LLMs and four knowledge-intensive tasks, while reducing retrieval calls by up to ~40% and achieving notable accuracy gains (e.g., up to ~19% depending on model). The approach offers scalable, efficient, and knowledge-aware RAG, with broad applicability to multi-source routing and cost-aware inference in real-world systems.

Abstract

Selective retrieval improves the accuracy and efficiency of retrieval-augmented generation (RAG) by reducing distractions from low-quality retrievals. However, existing approaches underutilize the inherent knowledge of large language models (LLMs), leading to suboptimal retrieval decisions and degraded generation performance. To bridge this gap, we propose Self-Routing RAG (SR-RAG), a novel framework that binds selective retrieval with knowledge verbalization. SR-RAG enables an LLM to dynamically decide whether to retrieve external knowledge or verbalize its own parametric knowledge. To this end, we design a multi-task objective that jointly optimizes an LLM for knowledge source selection, knowledge verbalization, and response generation. SR-RAG further incorporates a nearest neighbor search mechanism at inference time to improve the accuracy of knowledge source decisions under domain shifts. Fine-tuning three LLMs with SR-RAG significantly improves both their response accuracy and reduces the inference latency. Compared to the strongest selective retrieval baseline, SR-RAG reduces the number of retrievals by 29% while improving performance by 5.1%.

Self-Routing RAG: Binding Selective Retrieval with Knowledge Verbalization

TL;DR

Abstract

Self-Routing RAG: Binding Selective Retrieval with Knowledge Verbalization

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (10)