Table of Contents
Fetching ...

AutoSurvey: Large Language Models Can Automatically Write Surveys

Yidong Wang, Qi Guo, Wenjin Yao, Hongbo Zhang, Xin Zhang, Zhen Wu, Meishan Zhang, Xinyu Dai, Min Zhang, Qingsong Wen, Wei Ye, Shikun Zhang, Yue Zhang

TL;DR

<3-5 sentence high-level summary> The paper addresses the challenge of rapidly summarizing a growing body of AI research where traditional surveys lag behind. It introduces AutoSurvey, a pipeline with initial retrieval, parallel outline/subsection drafting, and iterative refinement guided by retrieval-augmented generation and multi-LLM evaluation. It demonstrates through extensive experiments that AutoSurvey achieves near-human content and citation quality at a fraction of the time compared with humans and naive RAG baselines. The work provides a structured evaluation framework and detailed ablations, offering a scalable approach to automated scholarly synthesis.

Abstract

This paper introduces AutoSurvey, a speedy and well-organized methodology for automating the creation of comprehensive literature surveys in rapidly evolving fields like artificial intelligence. Traditional survey paper creation faces challenges due to the vast volume and complexity of information, prompting the need for efficient survey methods. While large language models (LLMs) offer promise in automating this process, challenges such as context window limitations, parametric knowledge constraints, and the lack of evaluation benchmarks remain. AutoSurvey addresses these challenges through a systematic approach that involves initial retrieval and outline generation, subsection drafting by specialized LLMs, integration and refinement, and rigorous evaluation and iteration. Our contributions include a comprehensive solution to the survey problem, a reliable evaluation method, and experimental validation demonstrating AutoSurvey's effectiveness.We open our resources at \url{https://github.com/AutoSurveys/AutoSurvey}.

AutoSurvey: Large Language Models Can Automatically Write Surveys

TL;DR

<3-5 sentence high-level summary> The paper addresses the challenge of rapidly summarizing a growing body of AI research where traditional surveys lag behind. It introduces AutoSurvey, a pipeline with initial retrieval, parallel outline/subsection drafting, and iterative refinement guided by retrieval-augmented generation and multi-LLM evaluation. It demonstrates through extensive experiments that AutoSurvey achieves near-human content and citation quality at a fraction of the time compared with humans and naive RAG baselines. The work provides a structured evaluation framework and detailed ablations, offering a scalable approach to automated scholarly synthesis.

Abstract

This paper introduces AutoSurvey, a speedy and well-organized methodology for automating the creation of comprehensive literature surveys in rapidly evolving fields like artificial intelligence. Traditional survey paper creation faces challenges due to the vast volume and complexity of information, prompting the need for efficient survey methods. While large language models (LLMs) offer promise in automating this process, challenges such as context window limitations, parametric knowledge constraints, and the lack of evaluation benchmarks remain. AutoSurvey addresses these challenges through a systematic approach that involves initial retrieval and outline generation, subsection drafting by specialized LLMs, integration and refinement, and rigorous evaluation and iteration. Our contributions include a comprehensive solution to the survey problem, a reliable evaluation method, and experimental validation demonstrating AutoSurvey's effectiveness.We open our resources at \url{https://github.com/AutoSurveys/AutoSurvey}.
Paper Structure (23 sections, 4 figures, 7 tables, 1 algorithm)

This paper contains 23 sections, 4 figures, 7 tables, 1 algorithm.

Figures (4)

  • Figure 1: Depicting growth trends from 2019 to 2024 in the number of LLMs-related papers (a) and surveys (b) on arXiv, accompanied by a T-SNE visualization. The data for 2024 is up to April, with a red bar representing the forecasted numbers for the entire year. While the number of surveys is increasing rapidly, the visualization reveals areas where comprehensive surveys are still lacking, despite the overall growth in survey numbers. The research topics of the clusters in the T-SNE plot are generated using GPT-4 to describe their primary focus areas. These clusters of research voids can be addressed using AutoSurvey at a cost of $1.2 (cost analysis in Appendix \ref{['appendix:computational analysis']}) and 3 minutes per survey. An example survey focused on Emotion Recognition using LLMs is in Appendix \ref{['appendix:survey_example']}.
  • Figure 2: The AutoSurvey Pipeline for Generating Comprehensive Surveys.
  • Figure 3: Spearman's rho values indicating the degree of correlation between rankings given by LLMs and human experts. Note that A value over 0.3 indicates a positive correlation and over 0.5 indicates a strong positive correlation.
  • Figure 4: Impact of Iteration on AutoSurvey Performance.