Table of Contents
Fetching ...

SurveyForge: On the Outline Heuristics, Memory-Driven Generation, and Multi-dimensional Evaluation for Automated Survey Writing

Xiangchao Yan, Shiyang Feng, Jiakang Yuan, Renqiu Xia, Bin Wang, Bo Zhang, Lei Bai

TL;DR

SurveyForge addresses core weaknesses in AI-generated surveys by separating outline construction from content generation and grounding both stages in memory-guided retrieval. The two-stage pipeline combines heuristic outline generation from domain knowledge bases with a Scholar Navigation Agent (SANA) that retrieves high-quality literature and writes content in parallel, followed by refinement. The authors establish SurveyBench and SAM metrics to enable objective, multi-dimensional evaluation, demonstrating significant improvements over prior systems in outline quality, reference relevance, and content coherence. The approach promises scalable, up-to-date survey generation with reduced cost and time, while highlighting directions to further enhance deep inter-paper reasoning and citation accuracy.

Abstract

Survey paper plays a crucial role in scientific research, especially given the rapid growth of research publications. Recently, researchers have begun using LLMs to automate survey generation for better efficiency. However, the quality gap between LLM-generated surveys and those written by human remains significant, particularly in terms of outline quality and citation accuracy. To close these gaps, we introduce SurveyForge, which first generates the outline by analyzing the logical structure of human-written outlines and referring to the retrieved domain-related articles. Subsequently, leveraging high-quality papers retrieved from memory by our scholar navigation agent, SurveyForge can automatically generate and refine the content of the generated article. Moreover, to achieve a comprehensive evaluation, we construct SurveyBench, which includes 100 human-written survey papers for win-rate comparison and assesses AI-generated survey papers across three dimensions: reference, outline, and content quality. Experiments demonstrate that SurveyForge can outperform previous works such as AutoSurvey.

SurveyForge: On the Outline Heuristics, Memory-Driven Generation, and Multi-dimensional Evaluation for Automated Survey Writing

TL;DR

SurveyForge addresses core weaknesses in AI-generated surveys by separating outline construction from content generation and grounding both stages in memory-guided retrieval. The two-stage pipeline combines heuristic outline generation from domain knowledge bases with a Scholar Navigation Agent (SANA) that retrieves high-quality literature and writes content in parallel, followed by refinement. The authors establish SurveyBench and SAM metrics to enable objective, multi-dimensional evaluation, demonstrating significant improvements over prior systems in outline quality, reference relevance, and content coherence. The approach promises scalable, up-to-date survey generation with reduced cost and time, while highlighting directions to further enhance deep inter-paper reasoning and citation accuracy.

Abstract

Survey paper plays a crucial role in scientific research, especially given the rapid growth of research publications. Recently, researchers have begun using LLMs to automate survey generation for better efficiency. However, the quality gap between LLM-generated surveys and those written by human remains significant, particularly in terms of outline quality and citation accuracy. To close these gaps, we introduce SurveyForge, which first generates the outline by analyzing the logical structure of human-written outlines and referring to the retrieved domain-related articles. Subsequently, leveraging high-quality papers retrieved from memory by our scholar navigation agent, SurveyForge can automatically generate and refine the content of the generated article. Moreover, to achieve a comprehensive evaluation, we construct SurveyBench, which includes 100 human-written survey papers for win-rate comparison and assesses AI-generated survey papers across three dimensions: reference, outline, and content quality. Experiments demonstrate that SurveyForge can outperform previous works such as AutoSurvey.

Paper Structure

This paper contains 24 sections, 5 equations, 10 figures, 7 tables, 1 algorithm.

Figures (10)

  • Figure 1: Compared to human-written surveys, AI-generated surveys face two primary challenges. First, regarding the outline, these papers may often lack coherent logic and well-structured organization. Second, with respect to references, they frequently fail to include truly relevant and influential literature.
  • Figure 2: The overview of SurveyForge. The framework consists of two main stages: Outline Generation and Content Writing. In the Outline Generation stage, SurveyForge utilizes heuristic learning to generate well-structured outlines by leveraging topic-relevant literature and structural patterns from existing surveys. In the Content Writing stage, a memory-driven Scholar Navigation Agent (SANA) retrieves high-quality literature for each subsection and LLM generates the content of each subsection. Finally, the content is synthesized and refined into a coherent and comprehensive survey.
  • Figure 3: Evaluation results on SurveyBench. Evaluation results of (a) Input Coverage, (b) Reference Coverage, (c) Outline Quality, and (d) Content Quality.
  • Figure 4: Comparisons of survey outlines generated by the baseline method (left) and our proposed framework (right). The baseline displays a fragmented structure, whereas our method yields a more comprehensive, systematically organized outline.
  • Figure 5: Example of the survey generated by SurveyForge. Please refer to https://anonymous.4open.science/r/survey_example-7C37/ for more auto-generated results.
  • ...and 5 more figures