ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization
Xixi Wu, Kuan Li, Yida Zhao, Liwen Zhang, Litu Ou, Huifeng Yin, Zhongwang Zhang, Xinmiao Yu, Dingchu Zhang, Yong Jiang, Pengjun Xie, Fei Huang, Minhao Cheng, Shuai Wang, Hong Cheng, Jingren Zhou
TL;DR
The paper tackles the bottleneck of fixed context windows in knowledge-intensive web search by introducing ReSum, a paradigm that periodically compresses conversation history into compact summaries to enable long-horizon reasoning with minimal architectural changes.It contributes ReSumTool-30B, a specialized summary model tailored for goal-oriented web search, and ReSum-GRPO, an RL adaptation that segments trajectories at summary points and broadcasts trajectory-level advantages to train agents effectively in the summary-conditioned setting.Empirical results across multiple benchmarks show consistent improvements over ReAct, with notable gains after RL adaptation, and demonstrate the approach's applicability to agents with extended context windows, achieving competitive performance with reduced training data.Overall, ReSum offers a lightweight, compatible path to extend the reasoning horizon of web agents, enabling more reliable, evidence-grounded search outcomes in complex, uncertain scenarios.
Abstract
Large Language Model (LLM)-based web agents demonstrate strong performance on knowledge-intensive tasks but are hindered by context window limitations in paradigms like ReAct. Complex queries involving multiple entities, intertwined relationships, and high uncertainty demand extensive search cycles that rapidly exhaust context budgets before reaching solutions. To overcome this challenge, we introduce ReSum, a novel paradigm that enables indefinite exploration through periodic context summarization. ReSum converts growing interaction histories into compact reasoning states, maintaining awareness of prior discoveries while bypassing context constraints. For paradigm adaptation, we propose ReSum-GRPO, integrating GRPO with segmented trajectory training and advantage broadcasting to familiarize agents with summary-conditioned reasoning. Extensive experiments on web agents across three benchmarks demonstrate that ReSum delivers an average absolute improvement of 4.5% over ReAct, with further gains of 8.2% following ReSum-GRPO training. Notably, with only 1K training samples, our WebResummer-30B (a ReSum-GRPO-trained version of WebSailor-30B) achieves 33.3% Pass@1 on BrowseComp-zh and 18.3% on BrowseComp-en, surpassing most open-source web agents.
