Table of Contents
Fetching ...

ParallelMuse: Agentic Parallel Thinking for Deep Information Seeking

Baixuan Li, Dingchu Zhang, Jialong Wu, Wenbiao Yin, Zhengwei Tao, Yida Zhao, Liwen Zhang, Haiyang Shen, Runnan Fang, Pengjun Xie, Jingren Zhou, Yong Jiang

TL;DR

ParallelMuse addresses the inefficiency and context-limited challenges of applying parallel thinking to deep information-seeking agents. It introduces a two-stage paradigm: Functionality-Specified Partial Rollout to steer exploration by functional-region uncertainty, and Compressed Reasoning Aggregation to condense intermediate reasoning into structured reports for coherence-driven final answer synthesis. Empirical results across multiple open-source agents and benchmarks show up to $62\%$ performance gains with $10$–$30\%$ fewer exploratory tokens, driven by both context reuse and aggressive trajectory compression. The work provides practical design principles for scalable, efficient agentic reasoning and highlights the potential of cross-model aggregation strategies to further improve performance.

Abstract

Parallel thinking expands exploration breadth, complementing the deep exploration of information-seeking (IS) agents to further enhance problem-solving capability. However, conventional parallel thinking faces two key challenges in this setting: inefficiency from repeatedly rolling out from scratch, and difficulty in integrating long-horizon reasoning trajectories during answer generation, as limited context capacity prevents full consideration of the reasoning process. To address these issues, we propose ParallelMuse, a two-stage paradigm designed for deep IS agents. The first stage, Functionality-Specified Partial Rollout, partitions generated sequences into functional regions and performs uncertainty-guided path reuse and branching to enhance exploration efficiency. The second stage, Compressed Reasoning Aggregation, exploits reasoning redundancy to losslessly compress information relevant to answer derivation and synthesize a coherent final answer. Experiments across multiple open-source agents and benchmarks demonstrate up to 62% performance improvement with a 10--30% reduction in exploratory token consumption.

ParallelMuse: Agentic Parallel Thinking for Deep Information Seeking

TL;DR

ParallelMuse addresses the inefficiency and context-limited challenges of applying parallel thinking to deep information-seeking agents. It introduces a two-stage paradigm: Functionality-Specified Partial Rollout to steer exploration by functional-region uncertainty, and Compressed Reasoning Aggregation to condense intermediate reasoning into structured reports for coherence-driven final answer synthesis. Empirical results across multiple open-source agents and benchmarks show up to performance gains with fewer exploratory tokens, driven by both context reuse and aggressive trajectory compression. The work provides practical design principles for scalable, efficient agentic reasoning and highlights the potential of cross-model aggregation strategies to further improve performance.

Abstract

Parallel thinking expands exploration breadth, complementing the deep exploration of information-seeking (IS) agents to further enhance problem-solving capability. However, conventional parallel thinking faces two key challenges in this setting: inefficiency from repeatedly rolling out from scratch, and difficulty in integrating long-horizon reasoning trajectories during answer generation, as limited context capacity prevents full consideration of the reasoning process. To address these issues, we propose ParallelMuse, a two-stage paradigm designed for deep IS agents. The first stage, Functionality-Specified Partial Rollout, partitions generated sequences into functional regions and performs uncertainty-guided path reuse and branching to enhance exploration efficiency. The second stage, Compressed Reasoning Aggregation, exploits reasoning redundancy to losslessly compress information relevant to answer derivation and synthesize a coherent final answer. Experiments across multiple open-source agents and benchmarks demonstrate up to 62% performance improvement with a 10--30% reduction in exploratory token consumption.

Paper Structure

This paper contains 28 sections, 6 equations, 5 figures, 4 tables.

Figures (5)

  • Figure 1: KDE-smoothed distribution of steps with top-4 uncertainty on the BrowseComp subset (truncated to earlier steps as later ones are typically more certain). DeepSeek-V3.1-T denotes DeepSeek-V3.1-Terminus, and Tongyi-DR denotes Tongyi-DeepResearch-30B-A3B.
  • Figure 2: Average entity count per task and per model, where entities are extracted by GPT-4.1 based on the complete reasoning trajectory and ground-truth answer.
  • Figure 3: Workflow of ParallelMuse, including (Left) the Functionality-Specified Partial Rollout, where the Get Branch shows the selection of top-$k$ steps based on (exploration) tool-call uncertainty (just as an example of branching criterion), and (Right) the Compressed Reasoning Aggregation.
  • Figure 4: Performance gains from different answer generation methods, with sampling fixed to 8 from-scratch rollouts to isolate sampling (exploration) effects.
  • Figure 5: Efficiency gains using ParallelMuse. (i) (Left) Token reduction through context reuse in our partial rollout method. We take the token consumption per trajectory of the from-scratch rollout as the baseline. The green bars represent the token cost after applying partial rollout (the numbers above indicate the ratio relative to the baseline), while the remaining blue bars show the proportion of tokens saved. (ii) (Right) Comparison of context token usage before and after trajectory compression.