Table of Contents
Fetching ...

Search More, Think Less: Rethinking Long-Horizon Agentic Search for Efficiency and Generalization

Qianben Chen, Tianrui Qin, King Zhu, Qiexiang Wang, Chengjun Yu, Shu Xu, Jiaqi Wu, Jiayu Zhang, Xinpeng Liu, Xin Gui, Jingyi Cao, Piaohong Wang, Dingfeng Shi, He Zhu, Tiannan Wang, Yuqing Wang, Maojia Song, Tianyu Zheng, Ge Zhang, Jian Yang, Jiaheng Liu, Minghao Liu, Yuchen Eleanor Jiang, Wangchunshu Zhou

TL;DR

This work proposes \emph{Search More, Think Less} (SMTL), a framework for long-horizon agentic search that targets both efficiency and generalization, and introduces a unified data synthesis pipeline that constructs search tasks spanning both deterministic question answering and open-ended research scenarios with task appropriate evaluation metrics.

Abstract

Recent deep research agents primarily improve performance by scaling reasoning depth, but this leads to high inference cost and latency in search-intensive scenarios. Moreover, generalization across heterogeneous research settings remains challenging. In this work, we propose \emph{Search More, Think Less} (SMTL), a framework for long-horizon agentic search that targets both efficiency and generalization. SMTL replaces sequential reasoning with parallel evidence acquisition, enabling efficient context management under constrained context budgets. To support generalization across task types, we further introduce a unified data synthesis pipeline that constructs search tasks spanning both deterministic question answering and open-ended research scenarios with task appropriate evaluation metrics. We train an end-to-end agent using supervised fine-tuning and reinforcement learning, achieving strong and often state of the art performance across benchmarks including BrowseComp (48.6\%), GAIA (75.7\%), Xbench (82.0\%), and DeepResearch Bench (45.9\%). Compared to Mirothinker-v1.0, SMTL with maximum 100 interaction steps reduces the average number of reasoning steps on BrowseComp by 70.7\%, while improving accuracy.

Search More, Think Less: Rethinking Long-Horizon Agentic Search for Efficiency and Generalization

TL;DR

This work proposes \emph{Search More, Think Less} (SMTL), a framework for long-horizon agentic search that targets both efficiency and generalization, and introduces a unified data synthesis pipeline that constructs search tasks spanning both deterministic question answering and open-ended research scenarios with task appropriate evaluation metrics.

Abstract

Recent deep research agents primarily improve performance by scaling reasoning depth, but this leads to high inference cost and latency in search-intensive scenarios. Moreover, generalization across heterogeneous research settings remains challenging. In this work, we propose \emph{Search More, Think Less} (SMTL), a framework for long-horizon agentic search that targets both efficiency and generalization. SMTL replaces sequential reasoning with parallel evidence acquisition, enabling efficient context management under constrained context budgets. To support generalization across task types, we further introduce a unified data synthesis pipeline that constructs search tasks spanning both deterministic question answering and open-ended research scenarios with task appropriate evaluation metrics. We train an end-to-end agent using supervised fine-tuning and reinforcement learning, achieving strong and often state of the art performance across benchmarks including BrowseComp (48.6\%), GAIA (75.7\%), Xbench (82.0\%), and DeepResearch Bench (45.9\%). Compared to Mirothinker-v1.0, SMTL with maximum 100 interaction steps reduces the average number of reasoning steps on BrowseComp by 70.7\%, while improving accuracy.
Paper Structure (48 sections, 2 equations, 5 figures, 2 tables, 1 algorithm)

This paper contains 48 sections, 2 equations, 5 figures, 2 tables, 1 algorithm.

Figures (5)

  • Figure 1: Overview of SMTL Performance. (a) Efficiency on BrowseComp. All methods are evaluated with their default inference settings. (b) Generalization across benchmarks. Comprehensiveness, depth, instruction following, and readability are measured following DeepResearch Bench du2025deepresearchbench. We report our model's performance for Deep xxxxx
  • Figure 2: Overview of our parallel agentic workflow design.
  • Figure 3: Overview of the data construction pipeline.
  • Figure 4: Results of context management (CM) under different observation horizons $K$ with interaction budgets (IB) of 80 and 160 steps on the BC benchmark.
  • Figure 5: Case study illustration comparing SMTL-30B and MiroThinker-v1.0-30B. SMTL performs parallel subtask execution with staged re-planning, enabling faster localization and verification of key evidence, while MiroThinker-v1.0-30B follows a strictly sequential search process.