Table of Contents
Fetching ...

Deep Research: A Systematic Survey

Zhengliang Shi, Yiqun Chen, Haitao Li, Weiwei Sun, Shiyu Ni, Yougang Lyu, Run-Ze Fan, Bowen Jin, Yixuan Weng, Minjun Zhu, Qiujie Xie, Xinyu Guo, Qu Yang, Jiayi Wu, Jujia Zhao, Xiaqiang Tang, Xinbei Ma, Cunxiang Wang, Jiaxin Mao, Qingyao Ai, Jen-Tse Huang, Wenxuan Wang, Yue Zhang, Yiming Yang, Zhaopeng Tu, Zhaochun Ren

TL;DR

Deep Research (DR) reframes large language models as autonomous research agents capable of end-to-end reasoning, evidence gathering, and verifiable reporting. The paper formalizes a three-stage roadmap and four core components—query planning, information acquisition, memory management, and answer generation—and surveys practical optimization methods (workflow prompting, supervised fine-tuning, end-to-end RL) along with evaluation criteria and challenges. It differentiates DR from standard RAG by emphasizing long-horizon workflows, ground-truth citations, and iterative memory-informed reasoning. The work aims to standardize benchmarks and guide future research toward scalable, interpretable, and reliable AI-driven research across domains including science, policy, and software engineering.

Abstract

Large language models (LLMs) have rapidly evolved from text generators into powerful problem solvers. Yet, many open tasks demand critical thinking, multi-source, and verifiable outputs, which are beyond single-shot prompting or standard retrieval-augmented generation. Recently, numerous studies have explored Deep Research (DR), which aims to combine the reasoning capabilities of LLMs with external tools, such as search engines, thereby empowering LLMs to act as research agents capable of completing complex, open-ended tasks. This survey presents a comprehensive and systematic overview of deep research systems, including a clear roadmap, foundational components, practical implementation techniques, important challenges, and future directions. Specifically, our main contributions are as follows: (i) we formalize a three-stage roadmap and distinguish deep research from related paradigms; (ii) we introduce four key components: query planning, information acquisition, memory management, and answer generation, each paired with fine-grained sub-taxonomies; (iii) we summarize optimization techniques, including prompting, supervised fine-tuning, and agentic reinforcement learning; and (iv) we consolidate evaluation criteria and open challenges, aiming to guide and facilitate future development. As the field of deep research continues to evolve rapidly, we are committed to continuously updating this survey to reflect the latest progress in this area.

Deep Research: A Systematic Survey

TL;DR

Deep Research (DR) reframes large language models as autonomous research agents capable of end-to-end reasoning, evidence gathering, and verifiable reporting. The paper formalizes a three-stage roadmap and four core components—query planning, information acquisition, memory management, and answer generation—and surveys practical optimization methods (workflow prompting, supervised fine-tuning, end-to-end RL) along with evaluation criteria and challenges. It differentiates DR from standard RAG by emphasizing long-horizon workflows, ground-truth citations, and iterative memory-informed reasoning. The work aims to standardize benchmarks and guide future research toward scalable, interpretable, and reliable AI-driven research across domains including science, policy, and software engineering.

Abstract

Large language models (LLMs) have rapidly evolved from text generators into powerful problem solvers. Yet, many open tasks demand critical thinking, multi-source, and verifiable outputs, which are beyond single-shot prompting or standard retrieval-augmented generation. Recently, numerous studies have explored Deep Research (DR), which aims to combine the reasoning capabilities of LLMs with external tools, such as search engines, thereby empowering LLMs to act as research agents capable of completing complex, open-ended tasks. This survey presents a comprehensive and systematic overview of deep research systems, including a clear roadmap, foundational components, practical implementation techniques, important challenges, and future directions. Specifically, our main contributions are as follows: (i) we formalize a three-stage roadmap and distinguish deep research from related paradigms; (ii) we introduce four key components: query planning, information acquisition, memory management, and answer generation, each paired with fine-grained sub-taxonomies; (iii) we summarize optimization techniques, including prompting, supervised fine-tuning, and agentic reinforcement learning; and (iv) we consolidate evaluation criteria and open challenges, aiming to guide and facilitate future development. As the field of deep research continues to evolve rapidly, we are committed to continuously updating this survey to reflect the latest progress in this area.

Paper Structure

This paper contains 67 sections, 6 equations, 7 figures, 5 tables.

Figures (7)

  • Figure 1: An overview of four key components in a general deep research system, including: Task Planning (Section \ref{['query-planning']}). Information Acquisition (Section \ref{['information-acquision']}). Memory Management (Section \ref{['memory-management']}) and Answer Generation (Section \ref{['answer-generation']}).
  • Figure 2: Taxonomy of the main content of this survey.
  • Figure 3: Three commonly-used types of query planning: (i) parallel planning; (ii) sequential planning; and (iii) tree-based planning.
  • Figure 4: Existing information filtering approaches can be broadly categorized into the following types: (i) Document Selection; (ii) Context Compression; and (iii) Rule-based Cleaning.
  • Figure 5: Memory management contains four key stages: (1) Memory Consolidation, (2) Memory Indexing, (3) Memory Updating, and (4) Memory Forgetting.
  • ...and 2 more figures