Table of Contents
Fetching ...

REDSearcher: A Scalable and Cost-Efficient Framework for Long-Horizon Search Agents

Zheng Chu, Xiao Wang, Jack Hong, Huiming Fan, Yuqi Huang, Yue Yang, Guohai Xu, Chenxiao Zhao, Cheng Xiang, Shengchao Hu, Dongdong Kuang, Ming Liu, Bing Qin, Xing Yu

TL;DR

This work proposes REDSearcher, a unified framework that codesigns complex task synthesis, midtraining, and posttraining for scalable searchagent optimization, and introduces the following improvements: task synthesis as a dualconstrained optimization, where task difficulty is precisely governed by graph topology and evidence dispersion.

Abstract

Large language models are transitioning from generalpurpose knowledge engines to realworld problem solvers, yet optimizing them for deep search tasks remains challenging. The central bottleneck lies in the extreme sparsity of highquality search trajectories and reward signals, arising from the difficulty of scalable longhorizon task construction and the high cost of interactionheavy rollouts involving external tool calls. To address these challenges, we propose REDSearcher, a unified framework that codesigns complex task synthesis, midtraining, and posttraining for scalable searchagent optimization. Specifically, REDSearcher introduces the following improvements: (1) We frame task synthesis as a dualconstrained optimization, where task difficulty is precisely governed by graph topology and evidence dispersion, allowing scalable generation of complex, highquality tasks. (2) We introduce toolaugmented queries to encourage proactive tool use rather than passive recall.(3) During midtraining, we strengthen core atomic capabilities knowledge, planning, and function calling substantially reducing the cost of collecting highquality trajectories for downstream training. (4) We build a local simulated environment that enables rapid, lowcost algorithmic iteration for reinforcement learning experiments. Across both textonly and multimodal searchagent benchmarks, our approach achieves stateoftheart performance. To facilitate future research on longhorizon search agents, we will release 10K highquality complex text search trajectories, 5K multimodal trajectories and 1K text RL query set, and together with code and model checkpoints.

REDSearcher: A Scalable and Cost-Efficient Framework for Long-Horizon Search Agents

TL;DR

This work proposes REDSearcher, a unified framework that codesigns complex task synthesis, midtraining, and posttraining for scalable searchagent optimization, and introduces the following improvements: task synthesis as a dualconstrained optimization, where task difficulty is precisely governed by graph topology and evidence dispersion.

Abstract

Large language models are transitioning from generalpurpose knowledge engines to realworld problem solvers, yet optimizing them for deep search tasks remains challenging. The central bottleneck lies in the extreme sparsity of highquality search trajectories and reward signals, arising from the difficulty of scalable longhorizon task construction and the high cost of interactionheavy rollouts involving external tool calls. To address these challenges, we propose REDSearcher, a unified framework that codesigns complex task synthesis, midtraining, and posttraining for scalable searchagent optimization. Specifically, REDSearcher introduces the following improvements: (1) We frame task synthesis as a dualconstrained optimization, where task difficulty is precisely governed by graph topology and evidence dispersion, allowing scalable generation of complex, highquality tasks. (2) We introduce toolaugmented queries to encourage proactive tool use rather than passive recall.(3) During midtraining, we strengthen core atomic capabilities knowledge, planning, and function calling substantially reducing the cost of collecting highquality trajectories for downstream training. (4) We build a local simulated environment that enables rapid, lowcost algorithmic iteration for reinforcement learning experiments. Across both textonly and multimodal searchagent benchmarks, our approach achieves stateoftheart performance. To facilitate future research on longhorizon search agents, we will release 10K highquality complex text search trajectories, 5K multimodal trajectories and 1K text RL query set, and together with code and model checkpoints.
Paper Structure (65 sections, 6 equations, 10 figures, 3 tables)

This paper contains 65 sections, 6 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: Benchmark performance of REDSearcher.
  • Figure 2: Increasing reasoning complexity as a function of graph treewidth. From left to right, the dependency structure evolves from a simple chain ($k=1$), to a cyclic constraint graph ($k=2$), and finally to a fully coupled tetrahedral structure ($k=3$). Green nodes denote given entities and red nodes denote the final answer, while yellow nodes represent intermediate reasoning variables. Higher treewidth corresponds to larger jointly maintained variable sets and stronger global consistency constraints, transforming reasoning from linear propagation to high-dimensional constraint satisfaction.
  • Figure 3: Overview of the scalable complex task synthesis pipelinee. The process operates via a dual-pathway mechanism to maximize both structural complexity and information dispersion, followed by a rigorous solver-based verification stage.
  • Figure 4: Mid-training and post-training stages for REDSearcher.
  • Figure 5: Two stage agentic mid-training framework.
  • ...and 5 more figures