SurveyForge: On the Outline Heuristics, Memory-Driven Generation, and Multi-dimensional Evaluation for Automated Survey Writing

Xiangchao Yan; Shiyang Feng; Jiakang Yuan; Renqiu Xia; Bin Wang; Bo Zhang; Lei Bai

SurveyForge: On the Outline Heuristics, Memory-Driven Generation, and Multi-dimensional Evaluation for Automated Survey Writing

Xiangchao Yan, Shiyang Feng, Jiakang Yuan, Renqiu Xia, Bin Wang, Bo Zhang, Lei Bai

TL;DR

SurveyForge addresses core weaknesses in AI-generated surveys by separating outline construction from content generation and grounding both stages in memory-guided retrieval. The two-stage pipeline combines heuristic outline generation from domain knowledge bases with a Scholar Navigation Agent (SANA) that retrieves high-quality literature and writes content in parallel, followed by refinement. The authors establish SurveyBench and SAM metrics to enable objective, multi-dimensional evaluation, demonstrating significant improvements over prior systems in outline quality, reference relevance, and content coherence. The approach promises scalable, up-to-date survey generation with reduced cost and time, while highlighting directions to further enhance deep inter-paper reasoning and citation accuracy.

Abstract

Survey paper plays a crucial role in scientific research, especially given the rapid growth of research publications. Recently, researchers have begun using LLMs to automate survey generation for better efficiency. However, the quality gap between LLM-generated surveys and those written by human remains significant, particularly in terms of outline quality and citation accuracy. To close these gaps, we introduce SurveyForge, which first generates the outline by analyzing the logical structure of human-written outlines and referring to the retrieved domain-related articles. Subsequently, leveraging high-quality papers retrieved from memory by our scholar navigation agent, SurveyForge can automatically generate and refine the content of the generated article. Moreover, to achieve a comprehensive evaluation, we construct SurveyBench, which includes 100 human-written survey papers for win-rate comparison and assesses AI-generated survey papers across three dimensions: reference, outline, and content quality. Experiments demonstrate that SurveyForge can outperform previous works such as AutoSurvey.

SurveyForge: On the Outline Heuristics, Memory-Driven Generation, and Multi-dimensional Evaluation for Automated Survey Writing

TL;DR

Abstract

SurveyForge: On the Outline Heuristics, Memory-Driven Generation, and Multi-dimensional Evaluation for Automated Survey Writing

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (10)