Table of Contents
Fetching ...

RE-Searcher: Robust Agentic Search with Goal-oriented Planning and Self-reflection

Daocheng Fu, Jianbiao Mei, Licheng Wen, Xuemeng Yang, Cheng Yang, Rong Wu, Tao Hu, Siqi Li, Yufan Shen, Xinyu Cai, Pinlong Cai, Botian Shi, Yong Liu, Yu Qiao

TL;DR

RE-Searcher tackles the fragility of agentic search by introducing explicit goal-oriented planning and self-reflection to counteract environmental complexity. The method combines a structured chat-template for explicit searching, GRPO-based training with a search engine, and LLM-based reflection supervision to guide robust decision making. Empirical results show state-of-the-art accuracy on both in-domain and out-of-domain tasks and demonstrate strengthened robustness under noisy or misleading signals. The work offers practical insights for deploying autonomous LLM-powered agents in dynamic environments and highlights avenues for further strengthening supervision and data for even more reliable performance.

Abstract

Large language models (LLMs) excel at knowledge-intensive question answering and reasoning, yet their real-world deployment remains constrained by knowledge cutoff, hallucination, and limited interaction modalities. Augmenting LLMs with external search tools helps alleviate these issues, but it also exposes agents to a complex search environment in which small, plausible variations in query formulation can steer reasoning into unproductive trajectories and amplify errors. We present a systematic analysis that quantifies how environmental complexity induces fragile search behaviors and, in turn, degrades overall performance. To address this challenge, we propose a simple yet effective approach to instantiate a search agent, RE-Searcher. During search, RE-Searcher explicitly articulates a concrete search goal and subsequently reflects on whether the retrieved evidence satisfies that goal. This combination of goal-oriented planning and self-reflection enables RE-Searcher to resist spurious cues in complex search environments and perform robust search. Extensive experiments show that our method improves search accuracy and achieves state-of-the-art results. Perturbation studies further demonstrate substantial resilience to noisy or misleading external signals, mitigating the fragility of the search process. We believe these findings offer practical guidance for integrating LLM-powered agents into more complex interactive environments and enabling more autonomous decision-making.

RE-Searcher: Robust Agentic Search with Goal-oriented Planning and Self-reflection

TL;DR

RE-Searcher tackles the fragility of agentic search by introducing explicit goal-oriented planning and self-reflection to counteract environmental complexity. The method combines a structured chat-template for explicit searching, GRPO-based training with a search engine, and LLM-based reflection supervision to guide robust decision making. Empirical results show state-of-the-art accuracy on both in-domain and out-of-domain tasks and demonstrate strengthened robustness under noisy or misleading signals. The work offers practical insights for deploying autonomous LLM-powered agents in dynamic environments and highlights avenues for further strengthening supervision and data for even more reliable performance.

Abstract

Large language models (LLMs) excel at knowledge-intensive question answering and reasoning, yet their real-world deployment remains constrained by knowledge cutoff, hallucination, and limited interaction modalities. Augmenting LLMs with external search tools helps alleviate these issues, but it also exposes agents to a complex search environment in which small, plausible variations in query formulation can steer reasoning into unproductive trajectories and amplify errors. We present a systematic analysis that quantifies how environmental complexity induces fragile search behaviors and, in turn, degrades overall performance. To address this challenge, we propose a simple yet effective approach to instantiate a search agent, RE-Searcher. During search, RE-Searcher explicitly articulates a concrete search goal and subsequently reflects on whether the retrieved evidence satisfies that goal. This combination of goal-oriented planning and self-reflection enables RE-Searcher to resist spurious cues in complex search environments and perform robust search. Extensive experiments show that our method improves search accuracy and achieves state-of-the-art results. Perturbation studies further demonstrate substantial resilience to noisy or misleading external signals, mitigating the fragility of the search process. We believe these findings offer practical guidance for integrating LLM-powered agents into more complex interactive environments and enabling more autonomous decision-making.

Paper Structure

This paper contains 23 sections, 5 equations, 8 figures, 4 tables, 1 algorithm.

Figures (8)

  • Figure 1: A search path can be viewed as a sample from the keyword graph. When receiving the same query, the search agent generates two distinct sets of keywords during two independent experiments. Although both sets of keywords are semantically sound, the retrieved results differed dramatically. Our RE-Searcher, a search agent endowed with goal-oriented planning and self-reflection (orange arrow), can recover from such missteps and return to the correct trajectory, thereby enabling robust search behavior.
  • Figure 2: Accuracy rate of search agents based on different models. always right is the fraction of instances where all attempts are correct; random right is the fraction where at least one attempt is correct
  • Figure 3: Cosine similarity of the search results obtained from queries before and after perturbation; the red dot indicates the mean similarity.
  • Figure 4: Illustration of the proposed training methods. Left: The model is required to explicitly plan its search goals during the search process and reflect on the results after obtaining them. An external LLM monitors the training model's reflection results to ensure that its judgments are correct. Right: The search trajectory made by the trained agentic model shows the correct reflection and goal planning.
  • Figure 5: The training dynamics of the reflection value of different models.
  • ...and 3 more figures