Table of Contents
Fetching ...

R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning

Huatong Song, Jinhao Jiang, Wenqing Tian, Zhipeng Chen, Yuhuan Wu, Jiahao Zhao, Yingqian Min, Wayne Xin Zhao, Lei Fang, Ji-Rong Wen

TL;DR

The paper tackles the problem of LLM hallucinations by enabling dynamic switching between internal and external knowledge through the R1-Searcher++ framework. It introduces a two-stage training pipeline (SFT Cold-start and RL for Dynamic Knowledge Acquisition) plus a memorization module to internalize retrieved content, balancing reasoning efficiency with retrieval cost. Empirical results on multi-hop QA show improved accuracy and substantial retrieval reductions compared with prior RAG and RL baselines, with demonstrated generalization to online search. This work offers a practical path toward more robust, knowledge-efficient reasoning in LLMs.

Abstract

Large Language Models (LLMs) are powerful but prone to hallucinations due to static knowledge. Retrieval-Augmented Generation (RAG) helps by injecting external information, but current methods often are costly, generalize poorly, or ignore the internal knowledge of the model. In this paper, we introduce R1-Searcher++, a novel framework designed to train LLMs to adaptively leverage both internal and external knowledge sources. R1-Searcher++ employs a two-stage training strategy: an initial SFT Cold-start phase for preliminary format learning, followed by RL for Dynamic Knowledge Acquisition. The RL stage uses outcome-supervision to encourage exploration, incorporates a reward mechanism for internal knowledge utilization, and integrates a memorization mechanism to continuously assimilate retrieved information, thereby enriching the model's internal knowledge. By leveraging internal knowledge and external search engine, the model continuously improves its capabilities, enabling efficient retrieval-augmented reasoning. Our experiments demonstrate that R1-Searcher++ outperforms previous RAG and reasoning methods and achieves efficient retrieval. The code is available at https://github.com/RUCAIBox/R1-Searcher-plus.

R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning

TL;DR

The paper tackles the problem of LLM hallucinations by enabling dynamic switching between internal and external knowledge through the R1-Searcher++ framework. It introduces a two-stage training pipeline (SFT Cold-start and RL for Dynamic Knowledge Acquisition) plus a memorization module to internalize retrieved content, balancing reasoning efficiency with retrieval cost. Empirical results on multi-hop QA show improved accuracy and substantial retrieval reductions compared with prior RAG and RL baselines, with demonstrated generalization to online search. This work offers a practical path toward more robust, knowledge-efficient reasoning in LLMs.

Abstract

Large Language Models (LLMs) are powerful but prone to hallucinations due to static knowledge. Retrieval-Augmented Generation (RAG) helps by injecting external information, but current methods often are costly, generalize poorly, or ignore the internal knowledge of the model. In this paper, we introduce R1-Searcher++, a novel framework designed to train LLMs to adaptively leverage both internal and external knowledge sources. R1-Searcher++ employs a two-stage training strategy: an initial SFT Cold-start phase for preliminary format learning, followed by RL for Dynamic Knowledge Acquisition. The RL stage uses outcome-supervision to encourage exploration, incorporates a reward mechanism for internal knowledge utilization, and integrates a memorization mechanism to continuously assimilate retrieved information, thereby enriching the model's internal knowledge. By leveraging internal knowledge and external search engine, the model continuously improves its capabilities, enabling efficient retrieval-augmented reasoning. Our experiments demonstrate that R1-Searcher++ outperforms previous RAG and reasoning methods and achieves efficient retrieval. The code is available at https://github.com/RUCAIBox/R1-Searcher-plus.

Paper Structure

This paper contains 20 sections, 11 equations, 3 figures, 6 tables.

Figures (3)

  • Figure 1: Overall framework of our proposed R1-Searcher++ approach.
  • Figure 2: A qualitative example showing the deliberative reasoning process of RAG-Star in Bamboogle.
  • Figure 3: The log of retrieval count and reward for R1-Searcher and R1-Searcher++ during RL training.