SE-Search: Self-Evolving Search Agent via Memory and Dense Reward

Jian Li; Yizhang Jin; Dongqi Liu; Hang Ding; Jiafu Wu; Dongsheng Chen; Yunhang Shen; Yulei Qin; Ying Tai; Chengjie Wang; Xiaotong Yuan; Yabiao Wang

SE-Search: Self-Evolving Search Agent via Memory and Dense Reward

Jian Li, Yizhang Jin, Dongqi Liu, Hang Ding, Jiafu Wu, Dongsheng Chen, Yunhang Shen, Yulei Qin, Ying Tai, Chengjie Wang, Xiaotong Yuan, Yabiao Wang

TL;DR

This work proposes a Self-Evolving Search agent, a Self-Evolving Search agent that improves online search behavior through three components, memory purification, atomic query training, and dense rewards.

Abstract

Retrieval augmented generation (RAG) reduces hallucinations and factual errors in large language models (LLMs) by conditioning generation on retrieved external knowledge. Recent search agents further cast RAG as an autonomous, multi-turn information-seeking process. However, existing methods often accumulate irrelevant or noisy documents and rely on sparse reinforcement learning signals. We propose \textbf{S}elf-\textbf{E}volving \textbf{Search}, a Self-Evolving Search agent that improves online search behavior through three components, memory purification, atomic query training, and dense rewards. SE-Search follows a \textit{Think-Search-Memorize} strategy that retains salient evidence while filtering irrelevant content. Atomic query training promotes shorter and more diverse queries, improving evidence acquisition. Dense rewards provide fine-grained feedback that speeds training. Experiments on single-hop and multi-hop question answering benchmarks show that \texttt{SE-Search-3B} outperforms strong baselines, yielding a $10.8$ point absolute improvement and a $33.8\%$ relative gain over Search-R1.\footnote{We will make the code and model weights publicly available upon acceptance.}

SE-Search: Self-Evolving Search Agent via Memory and Dense Reward

TL;DR

Abstract

point absolute improvement and a

relative gain over Search-R1.\footnote{We will make the code and model weights publicly available upon acceptance.}

Paper Structure (34 sections, 17 equations, 5 figures, 7 tables, 1 algorithm)

This paper contains 34 sections, 17 equations, 5 figures, 7 tables, 1 algorithm.

Introduction
Problem Formulation
Approach
Self-Evolving Search Agent
Memory Purification.
Atomic Query
Dense Rewards
Agentic Reinforcement Learning
Experiments
Experiment Settings
Benchmarks and Datasets.
Baselines.
Implementation Details.
Main Performance
Ablation Studies
...and 19 more sections

Figures (5)

Figure 1: Training scheme of SE-Search. For each question, the search agent generates diverse trajectories comprising the steps think, search, memorize, and answer. These trajectories are optimized using the GRPO algorithm shao2024deepseekmath and four carefully designed rewards: Query, Format, Memory, and Outcome.
Figure 2: Prompt template for SE-Search.
Figure 3: SE-Search's evolution of (a) EM accuracy and (b) search calls on training set and benchmarks.
Figure 4: Statistics for (a) search query lengths, (b) search query diversity, (c) accuracy of memory contents.
Figure 5: SE-Search's evolution of EM accuracy across seven benchmarks.

SE-Search: Self-Evolving Search Agent via Memory and Dense Reward

TL;DR

Abstract

SE-Search: Self-Evolving Search Agent via Memory and Dense Reward

Authors

TL;DR

Abstract

Table of Contents

Figures (5)