Table of Contents
Fetching ...

WebWeaver: Structuring Web-Scale Evidence with Dynamic Outlines for Open-Ended Deep Research

Zijian Li, Xin Guan, Bo Zhang, Shen Huang, Houquan Zhou, Shaopeng Lai, Ming Yan, Yong Jiang, Pengjun Xie, Fei Huang, Jun Zhang, Jingren Zhou

TL;DR

Open-ended deep research requires synthesizing large web-scale information into grounded reports with accurate citations. WebWeaver tackles this with a dual-agent design: a planner that co-evolves an evidence-backed outline and a memory-grounded writer that assembles the report section-by-section. The approach achieves state-of-the-art results on three challenging benchmarks and introduces WebWeaver-3k for agentic finetuning of smaller models, addressing long-context and citation reliability challenges. The work offers a practical blueprint for memory-aware, evidence-grounded long-form AI writing and knowledge work.

Abstract

This paper tackles \textbf{open-ended deep research (OEDR)}, a complex challenge where AI agents must synthesize vast web-scale information into insightful reports. Current approaches are plagued by dual-fold limitations: static research pipelines that decouple planning from evidence acquisition and monolithic generation paradigms that include redundant, irrelevant evidence, suffering from hallucination issues and low citation accuracy. To address these challenges, we introduce \textbf{WebWeaver}, a novel dual-agent framework that emulates the human research process. The planner operates in a dynamic cycle, iteratively interleaving evidence acquisition with outline optimization to produce a comprehensive, citation-grounded outline linking to a memory bank of evidence. The writer then executes a hierarchical retrieval and writing process, composing the report section by section. By performing targeted retrieval of only the necessary evidence from the memory bank via citations for each part, it effectively mitigates long-context issues and citation hallucinations. Our framework establishes a new state-of-the-art across major OEDR benchmarks, including DeepResearch Bench, DeepConsult, and DeepResearchGym. These results validate our human-centric, iterative methodology, demonstrating that adaptive planning and focused synthesis are crucial for producing comprehensive, trusted, and well-structured reports.

WebWeaver: Structuring Web-Scale Evidence with Dynamic Outlines for Open-Ended Deep Research

TL;DR

Open-ended deep research requires synthesizing large web-scale information into grounded reports with accurate citations. WebWeaver tackles this with a dual-agent design: a planner that co-evolves an evidence-backed outline and a memory-grounded writer that assembles the report section-by-section. The approach achieves state-of-the-art results on three challenging benchmarks and introduces WebWeaver-3k for agentic finetuning of smaller models, addressing long-context and citation reliability challenges. The work offers a practical blueprint for memory-aware, evidence-grounded long-form AI writing and knowledge work.

Abstract

This paper tackles \textbf{open-ended deep research (OEDR)}, a complex challenge where AI agents must synthesize vast web-scale information into insightful reports. Current approaches are plagued by dual-fold limitations: static research pipelines that decouple planning from evidence acquisition and monolithic generation paradigms that include redundant, irrelevant evidence, suffering from hallucination issues and low citation accuracy. To address these challenges, we introduce \textbf{WebWeaver}, a novel dual-agent framework that emulates the human research process. The planner operates in a dynamic cycle, iteratively interleaving evidence acquisition with outline optimization to produce a comprehensive, citation-grounded outline linking to a memory bank of evidence. The writer then executes a hierarchical retrieval and writing process, composing the report section by section. By performing targeted retrieval of only the necessary evidence from the memory bank via citations for each part, it effectively mitigates long-context issues and citation hallucinations. Our framework establishes a new state-of-the-art across major OEDR benchmarks, including DeepResearch Bench, DeepConsult, and DeepResearchGym. These results validate our human-centric, iterative methodology, demonstrating that adaptive planning and focused synthesis are crucial for producing comprehensive, trusted, and well-structured reports.

Paper Structure

This paper contains 21 sections, 1 equation, 14 figures, 4 tables.

Figures (14)

  • Figure 1: Performance of varying deep research agents on DeepResearch Bench (RACE). The results on DeepResearch Bench are taken from the official leaderboard. Our proposed WebWeaver achieves state-of-the-art performance and even outperforms reference answers.
  • Figure 2: Performance of varying deep research agents on DeepResearch Bench (FACT). Our proposed WebWeaver achieves highest effective citations and citation accuracy.
  • Figure 3: (a) the search-then-generate paradigm first gathers information and then directly generates a report; (b) the paradigms decouple the searching and outline generation; (c) WebWeaver not only enables a dynamic research cycle where the outline and search strategy co-evolve but allows hierarchical and attentional writing by retrieving only relevant evidence with citations in the outline.
  • Figure 4: The workflow of WebWeaver. Left: The planner first iteratively collects evidence via the search action and optimizes the outline until outputting a comprehensive and citation-grounded outline. Right: The writer performs hierarchical and attentional writing by retrieving relevant evidence with the grounded citations in the outline.
  • Figure 5: Statistics of outline optimization of Claude-sonnet-4-20250514 on DeepResearch Bench and DeepResearchGym.
  • ...and 9 more figures