Table of Contents
Fetching ...

SurveyX: Academic Survey Automation via Large Language Models

Xun Liang, Jiawei Yang, Yezhaohui Wang, Chen Tang, Zifan Zheng, Shichao Song, Zehao Lin, Yebin Yang, Simin Niu, Hanyu Wang, Bo Tang, Feiyu Xiong, Keming Mao, Zhiyu li

TL;DR

The paper addresses the challenge of generating comprehensive, up-to-date academic surveys amidst rapidly expanding literature. It introduces SurveyX, a two-phase system comprising Preparation (reference acquisition with online retrieval and AttributeTree preprocessing) and Generation (outline/content generation plus post-refinement) to produce high-quality surveys with enriched figures and tables. The approach is evaluated with expanded automatic and human metrics, showing SurveyX outperforms prior automated methods and approaches human expert performance across multiple dimensions. This work suggests a scalable framework for automated, credible scholarly surveys with practical implications for researchers and evaluators.

Abstract

Large Language Models (LLMs) have demonstrated exceptional comprehension capabilities and a vast knowledge base, suggesting that LLMs can serve as efficient tools for automated survey generation. However, recent research related to automated survey generation remains constrained by some critical limitations like finite context window, lack of in-depth content discussion, and absence of systematic evaluation frameworks. Inspired by human writing processes, we propose SurveyX, an efficient and organized system for automated survey generation that decomposes the survey composing process into two phases: the Preparation and Generation phases. By innovatively introducing online reference retrieval, a pre-processing method called AttributeTree, and a re-polishing process, SurveyX significantly enhances the efficacy of survey composition. Experimental evaluation results show that SurveyX outperforms existing automated survey generation systems in content quality (0.259 improvement) and citation quality (1.76 enhancement), approaching human expert performance across multiple evaluation dimensions. Examples of surveys generated by SurveyX are available on www.surveyx.cn

SurveyX: Academic Survey Automation via Large Language Models

TL;DR

The paper addresses the challenge of generating comprehensive, up-to-date academic surveys amidst rapidly expanding literature. It introduces SurveyX, a two-phase system comprising Preparation (reference acquisition with online retrieval and AttributeTree preprocessing) and Generation (outline/content generation plus post-refinement) to produce high-quality surveys with enriched figures and tables. The approach is evaluated with expanded automatic and human metrics, showing SurveyX outperforms prior automated methods and approaches human expert performance across multiple dimensions. This work suggests a scalable framework for automated, credible scholarly surveys with practical implications for researchers and evaluators.

Abstract

Large Language Models (LLMs) have demonstrated exceptional comprehension capabilities and a vast knowledge base, suggesting that LLMs can serve as efficient tools for automated survey generation. However, recent research related to automated survey generation remains constrained by some critical limitations like finite context window, lack of in-depth content discussion, and absence of systematic evaluation frameworks. Inspired by human writing processes, we propose SurveyX, an efficient and organized system for automated survey generation that decomposes the survey composing process into two phases: the Preparation and Generation phases. By innovatively introducing online reference retrieval, a pre-processing method called AttributeTree, and a re-polishing process, SurveyX significantly enhances the efficacy of survey composition. Experimental evaluation results show that SurveyX outperforms existing automated survey generation systems in content quality (0.259 improvement) and citation quality (1.76 enhancement), approaching human expert performance across multiple evaluation dimensions. Examples of surveys generated by SurveyX are available on www.surveyx.cn

Paper Structure

This paper contains 30 sections, 6 equations, 16 figures, 4 tables, 1 algorithm.

Figures (16)

  • Figure 1: The number of papers received annually by the arXiv website from 2010 to 2025, with data sourced from our arXiv database. The projected number of submissions for 2025 is anticipated to be five times greater than that of 2010.
  • Figure 2: Pipeline of SurveyX.
  • Figure 3: An example of generating secondary outlines. LLMs first generate hints based on the attribute tree to guide the generating of the secondary outline. Then, by synthesizing all hints, LLMs identify the most suitable entry points to determine the segmentation strategy and generate the secondary outline.
  • Figure 4: Human evaluation results.
  • Figure 5: Content coverage prompt for evaluation.
  • ...and 11 more figures