RE-Searcher: Robust Agentic Search with Goal-oriented Planning and Self-reflection

Daocheng Fu; Jianbiao Mei; Licheng Wen; Xuemeng Yang; Cheng Yang; Rong Wu; Tao Hu; Siqi Li; Yufan Shen; Xinyu Cai; Pinlong Cai; Botian Shi; Yong Liu; Yu Qiao

RE-Searcher: Robust Agentic Search with Goal-oriented Planning and Self-reflection

Daocheng Fu, Jianbiao Mei, Licheng Wen, Xuemeng Yang, Cheng Yang, Rong Wu, Tao Hu, Siqi Li, Yufan Shen, Xinyu Cai, Pinlong Cai, Botian Shi, Yong Liu, Yu Qiao

TL;DR

RE-Searcher tackles the fragility of agentic search by introducing explicit goal-oriented planning and self-reflection to counteract environmental complexity. The method combines a structured chat-template for explicit searching, GRPO-based training with a search engine, and LLM-based reflection supervision to guide robust decision making. Empirical results show state-of-the-art accuracy on both in-domain and out-of-domain tasks and demonstrate strengthened robustness under noisy or misleading signals. The work offers practical insights for deploying autonomous LLM-powered agents in dynamic environments and highlights avenues for further strengthening supervision and data for even more reliable performance.

Abstract

Large language models (LLMs) excel at knowledge-intensive question answering and reasoning, yet their real-world deployment remains constrained by knowledge cutoff, hallucination, and limited interaction modalities. Augmenting LLMs with external search tools helps alleviate these issues, but it also exposes agents to a complex search environment in which small, plausible variations in query formulation can steer reasoning into unproductive trajectories and amplify errors. We present a systematic analysis that quantifies how environmental complexity induces fragile search behaviors and, in turn, degrades overall performance. To address this challenge, we propose a simple yet effective approach to instantiate a search agent, RE-Searcher. During search, RE-Searcher explicitly articulates a concrete search goal and subsequently reflects on whether the retrieved evidence satisfies that goal. This combination of goal-oriented planning and self-reflection enables RE-Searcher to resist spurious cues in complex search environments and perform robust search. Extensive experiments show that our method improves search accuracy and achieves state-of-the-art results. Perturbation studies further demonstrate substantial resilience to noisy or misleading external signals, mitigating the fragility of the search process. We believe these findings offer practical guidance for integrating LLM-powered agents into more complex interactive environments and enabling more autonomous decision-making.

RE-Searcher: Robust Agentic Search with Goal-oriented Planning and Self-reflection

TL;DR

Abstract

RE-Searcher: Robust Agentic Search with Goal-oriented Planning and Self-reflection

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (8)