WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent Reinforcement Learning

Zelai Xu; Zhexuan Xu; Ruize Zhang; Chunyang Zhu; Shi Yu; Weilin Liu; Quanlu Zhang; Wenbo Ding; Chao Yu; Yu Wang

WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent Reinforcement Learning

Zelai Xu, Zhexuan Xu, Ruize Zhang, Chunyang Zhu, Shi Yu, Weilin Liu, Quanlu Zhang, Wenbo Ding, Chao Yu, Yu Wang

TL;DR

WideSeek-R1 demonstrates that width scaling through a joint lead-agent–subagent MARL framework can effectively tackle broad information seeking with parallel execution, using a shared LLM and isolated contexts. Trained on a new 20,000-task dataset, the system achieves competitive performance with far fewer parameters than large single-agent models, and exhibits robust gains as the number of subagents increases. The work highlights the complementary role of width scaling to depth scaling, supported by ablations and standard QA benchmarks, and provides a practical dataset to foster future research in scalable multi-agent information gathering.

Abstract

Recent advancements in Large Language Models (LLMs) have largely focused on depth scaling, where a single agent solves long-horizon problems with multi-turn reasoning and tool use. However, as tasks grow broader, the key bottleneck shifts from individual competence to organizational capability. In this work, we explore a complementary dimension of width scaling with multi-agent systems to address broad information seeking. Existing multi-agent systems often rely on hand-crafted workflows and turn-taking interactions that fail to parallelize work effectively. To bridge this gap, we propose WideSeek-R1, a lead-agent-subagent framework trained via multi-agent reinforcement learning (MARL) to synergize scalable orchestration and parallel execution. By utilizing a shared LLM with isolated contexts and specialized tools, WideSeek-R1 jointly optimizes the lead agent and parallel subagents on a curated dataset of 20k broad information-seeking tasks. Extensive experiments show that WideSeek-R1-4B achieves an item F1 score of 40.0% on the WideSearch benchmark, which is comparable to the performance of single-agent DeepSeek-R1-671B. Furthermore, WideSeek-R1-4B exhibits consistent performance gains as the number of parallel subagents increases, highlighting the effectiveness of width scaling.

WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent Reinforcement Learning

TL;DR

Abstract

Paper Structure (49 sections, 12 equations, 8 figures, 3 tables)

This paper contains 49 sections, 12 equations, 8 figures, 3 tables.

Introduction
Related Work
WideSeek-R1
Lead Agent for Scalable Orchestration
Subagents for Parallel Execution
Multi-Agent Reinforcement Learning
Training Data Construction
Query Generation
Answer Generation
QA Pair Filtering
Experiments
Main Results
Exploring Width Scaling
Ablation Studies
Standard QA Benchmarks
...and 34 more sections

Figures (8)

Figure 1: Comparison of depth and width scaling. While depth scaling enhances performance through sequential multi-turn interactions, width scaling orchestrates multi-agent systems for parallel execution. WideSeek-R1 pushes the frontier of width scaling via MARL for synergized orchestration and execution.
Figure 2: Overview of WideSeek-R1 Rollout and Training Pipeline. (1) Rollout: The lead agent coordinates task decomposition while subagents execute parallel subtasks using external tools. (2) Training: We adopt group-level advantage normalization and assign the same advantage to all agents within each multi-agent system, followed by a dual-level advantage reweighting mechanism at both token level and agent level applied to the GRPO objective for effective multi-agent, multi-turn RL training.
Figure 3: Overview of our Automated Data Construction Pipeline. The pipeline comprises three stages: (1) Query Generation, where we extract user intents from HybridQA chen2020hybridqa and refine them into complex, schema-constrained queries that mandate specific table structures and broad coverage; (2) Answer Generation, where we prompt the model to generate two responses independently along with the unique column(s), enabling self-consistency verification; and (3) QA Pair Filtering, where we rigorously screen the data by discarding instances with low consistency or insufficient difficulty, ensuring that only robust and challenging samples remain in the final dataset. marks the steps powered by the gemini-3-pro-preview API.
Figure 4: Comparison of depth and width scaling in performance with respect to (w.r.t.) test-time compute. The blue curve shows depth scaling in performance w.r.t. the number of turns (bottom axis), while the two red curves show width scaling in performance w.r.t. the number of subagents (top axis).
Figure 5: Ablation study on lead agent and subagents by assigning WideSeek-R1-4B to different roles.
...and 3 more figures

WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent Reinforcement Learning

TL;DR

Abstract

WideSeek-R1: Exploring Width Scaling for Broad Information Seeking via Multi-Agent Reinforcement Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (8)