ParallelMuse: Agentic Parallel Thinking for Deep Information Seeking
Baixuan Li, Dingchu Zhang, Jialong Wu, Wenbiao Yin, Zhengwei Tao, Yida Zhao, Liwen Zhang, Haiyang Shen, Runnan Fang, Pengjun Xie, Jingren Zhou, Yong Jiang
TL;DR
ParallelMuse addresses the inefficiency and context-limited challenges of applying parallel thinking to deep information-seeking agents. It introduces a two-stage paradigm: Functionality-Specified Partial Rollout to steer exploration by functional-region uncertainty, and Compressed Reasoning Aggregation to condense intermediate reasoning into structured reports for coherence-driven final answer synthesis. Empirical results across multiple open-source agents and benchmarks show up to $62\%$ performance gains with $10$–$30\%$ fewer exploratory tokens, driven by both context reuse and aggressive trajectory compression. The work provides practical design principles for scalable, efficient agentic reasoning and highlights the potential of cross-model aggregation strategies to further improve performance.
Abstract
Parallel thinking expands exploration breadth, complementing the deep exploration of information-seeking (IS) agents to further enhance problem-solving capability. However, conventional parallel thinking faces two key challenges in this setting: inefficiency from repeatedly rolling out from scratch, and difficulty in integrating long-horizon reasoning trajectories during answer generation, as limited context capacity prevents full consideration of the reasoning process. To address these issues, we propose ParallelMuse, a two-stage paradigm designed for deep IS agents. The first stage, Functionality-Specified Partial Rollout, partitions generated sequences into functional regions and performs uncertainty-guided path reuse and branching to enhance exploration efficiency. The second stage, Compressed Reasoning Aggregation, exploits reasoning redundancy to losslessly compress information relevant to answer derivation and synthesize a coherent final answer. Experiments across multiple open-source agents and benchmarks demonstrate up to 62% performance improvement with a 10--30% reduction in exploratory token consumption.
