Table of Contents
Fetching ...

Self-Optimizing Multi-Agent Systems for Deep Research

Arthur Câmara, Vincent Slot, Jakub Zavrel

Abstract

Given a user's complex information need, a multi-agent Deep Research system iteratively plans, retrieves, and synthesizes evidence across hundreds of documents to produce a high-quality answer. In one possible architecture, an orchestrator agent coordinates the process, while parallel worker agents execute tasks. Current Deep Research systems, however, often rely on hand-engineered prompts and static architectures, making improvement brittle, expensive, and time-consuming. We therefore explore various multi-agent optimization methods to show that enabling agents to self-play and explore different prompt combinations can produce high-quality Deep Research systems that match or outperform expert-crafted prompts.

Self-Optimizing Multi-Agent Systems for Deep Research

Abstract

Given a user's complex information need, a multi-agent Deep Research system iteratively plans, retrieves, and synthesizes evidence across hundreds of documents to produce a high-quality answer. In one possible architecture, an orchestrator agent coordinates the process, while parallel worker agents execute tasks. Current Deep Research systems, however, often rely on hand-engineered prompts and static architectures, making improvement brittle, expensive, and time-consuming. We therefore explore various multi-agent optimization methods to show that enabling agents to self-play and explore different prompt combinations can produce high-quality Deep Research systems that match or outperform expert-crafted prompts.

Paper Structure

This paper contains 18 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: Architecture for a multi-agent DR system: an orchestrator agent (orchestrator) creates a list of tasks for the user's question. Each task consists of a query and instructions. multiple reader agents (reader) inspect batches of documents and extract the information requested in the task. an aggregator agent (aggregator) combines these smaller information pieces into larger mini-reports for each task. the orchestrator agent can decide whether to refine the plan with more tasks or call a writer agent (writer), which combines all merged information into a long-form report.
  • Figure 2: Example of exploration trees for both GEPA and TextGrad. Each node in the tree is a new candidate that was generated based on its parent. GEPA manages to explore different variants in a more diversified manner, while TextGrad does not explore that much.