Table of Contents
Fetching ...

DualSpec: Accelerating Deep Research Agents via Dual-Process Action Speculation

Shuzhang Zhong, Baotong Lu, Qi Chen, Chuanjie Liu, Fan Yang, Meng Li

TL;DR

It is shown that Search decisions have higher uncertainty and benefit significantly from explicit reasoning, whereas Visit decisions have lower entropy and depend primarily on model capacity, and this work proposes DualSpec, a heterogeneous speculation framework equipped with a lightweight, confidence-based semantic verifier.

Abstract

Large language model-based deep research agents have been increasingly popular for addressing long-horizon information-seeking tasks, but they often incur high end-to-end latency due to extensive reasoning and frequent tool use. Speculation frameworks aim to reduce latency by overlapping action execution with reasoning; however, existing approaches typically rely on uniform speculation strategies and strict action matching, which limits inference speedups and robustness. In this work, we revisit the speculate-verify paradigm for deep research agents through the lens of action heterogeneity. We show that \textit{Search} and \textit{Visit} actions exhibit fundamentally different reasoning and model capacity requirements: entropy-based analysis reveals that Search decisions have higher uncertainty and benefit significantly from explicit reasoning, whereas Visit decisions have lower entropy and depend primarily on model capacity. Motivated by this dual-process characteristic, we propose DualSpec, a heterogeneous speculation framework equipped with a lightweight, confidence-based semantic verifier. Experiments across multiple models and benchmarks demonstrate that DualSpec achieves up to 3.28$\times$ end-to-end speedup while maintaining accuracy comparable to fully reasoning agents.

DualSpec: Accelerating Deep Research Agents via Dual-Process Action Speculation

TL;DR

It is shown that Search decisions have higher uncertainty and benefit significantly from explicit reasoning, whereas Visit decisions have lower entropy and depend primarily on model capacity, and this work proposes DualSpec, a heterogeneous speculation framework equipped with a lightweight, confidence-based semantic verifier.

Abstract

Large language model-based deep research agents have been increasingly popular for addressing long-horizon information-seeking tasks, but they often incur high end-to-end latency due to extensive reasoning and frequent tool use. Speculation frameworks aim to reduce latency by overlapping action execution with reasoning; however, existing approaches typically rely on uniform speculation strategies and strict action matching, which limits inference speedups and robustness. In this work, we revisit the speculate-verify paradigm for deep research agents through the lens of action heterogeneity. We show that \textit{Search} and \textit{Visit} actions exhibit fundamentally different reasoning and model capacity requirements: entropy-based analysis reveals that Search decisions have higher uncertainty and benefit significantly from explicit reasoning, whereas Visit decisions have lower entropy and depend primarily on model capacity. Motivated by this dual-process characteristic, we propose DualSpec, a heterogeneous speculation framework equipped with a lightweight, confidence-based semantic verifier. Experiments across multiple models and benchmarks demonstrate that DualSpec achieves up to 3.28 end-to-end speedup while maintaining accuracy comparable to fully reasoning agents.
Paper Structure (36 sections, 7 equations, 9 figures, 2 tables)

This paper contains 36 sections, 7 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: Deep research agent workflow. Deep research agents follow a Reason-Action-Observation loop, where the agent alternates between generating reasoning traces and executing actions (Search or Visit) to gather information.
  • Figure 2: Deep research inference characteristics using different models. "Miro" denotes "MiroThinker" while "Qwen" denotes "Qwen-3". (a) Tool usage ratio on the GAIA benchmark. (b) Time breakdown per step on model reasoning and tool execution. Model reasoning accounts for a significant fraction of the total latency.
  • Figure 3: Average reasoning length for generating Search and Visit actions across models and benchmarks. Search requires significantly longer reasoning than Visit.
  • Figure 4: Action alignment comparison of two speculative methods relative to the Oracle (large model with reasoning) when drafting Search and Visit. (a) The small reasoning model produces queries more aligned with the Oracle than the large model without reasoning. (b--c) For Visit, the large model skipping reasoning achieves higher accuracy in both URL selection and extraction instruction.
  • Figure 5: Action log probability distributions with and without reasoning. A higher log probability indicates lower uncertainty. Without reasoning (dark blue color), Search actions exhibit lower log probabilities than Visit actions, indicating higher baseline decision uncertainty. When reasoning is incorporated (light blue color with hatching), both action types see increased log probabilities, but the increase is significantly larger for Search actions, reflecting a greater reduction in uncertainty due to reasoning.
  • ...and 4 more figures