Table of Contents
Fetching ...

SurveySum: A Dataset for Summarizing Multiple Scientific Articles into a Survey Section

Leandro Carísio Fernandes, Gustavo Bartz Guedes, Thiago Soares Laitz, Thales Sales Almeida, Rodrigo Nogueira, Roberto Lotufo, Jayr Pereira

TL;DR

SurveySum introduces a dataset for summarizing multiple scientific articles into a single survey section, addressing the lack of in-domain multi-document survey generation resources. It defines a two-pipeline framework and a three-stage general workflow for constructing survey sections from cited papers, and evaluates them with References F1, G-eval, and Check-Eval metrics to assess citation use, content quality, and consistency. Key findings show that retrieval quality and the choice of LLM (e.g., GPT-4 vs GPT-3.5) significantly affect performance, with Pipeline 2 generally outperforming Pipeline 1. The work provides a foundation for benchmarking and advancing domain-specific survey summarization in AI/ML/NLP literatures. It demonstrates the practical importance of high-quality retrieval in enabling coherent, accurate survey sections drawn from diverse scientific sources.

Abstract

Document summarization is a task to shorten texts into concise and informative summaries. This paper introduces a novel dataset designed for summarizing multiple scientific articles into a section of a survey. Our contributions are: (1) SurveySum, a new dataset addressing the gap in domain-specific summarization tools; (2) two specific pipelines to summarize scientific articles into a section of a survey; and (3) the evaluation of these pipelines using multiple metrics to compare their performance. Our results highlight the importance of high-quality retrieval stages and the impact of different configurations on the quality of generated summaries.

SurveySum: A Dataset for Summarizing Multiple Scientific Articles into a Survey Section

TL;DR

SurveySum introduces a dataset for summarizing multiple scientific articles into a single survey section, addressing the lack of in-domain multi-document survey generation resources. It defines a two-pipeline framework and a three-stage general workflow for constructing survey sections from cited papers, and evaluates them with References F1, G-eval, and Check-Eval metrics to assess citation use, content quality, and consistency. Key findings show that retrieval quality and the choice of LLM (e.g., GPT-4 vs GPT-3.5) significantly affect performance, with Pipeline 2 generally outperforming Pipeline 1. The work provides a foundation for benchmarking and advancing domain-specific survey summarization in AI/ML/NLP literatures. It demonstrates the practical importance of high-quality retrieval in enabling coherent, accurate survey sections drawn from diverse scientific sources.

Abstract

Document summarization is a task to shorten texts into concise and informative summaries. This paper introduces a novel dataset designed for summarizing multiple scientific articles into a section of a survey. Our contributions are: (1) SurveySum, a new dataset addressing the gap in domain-specific summarization tools; (2) two specific pipelines to summarize scientific articles into a section of a survey; and (3) the evaluation of these pipelines using multiple metrics to compare their performance. Our results highlight the importance of high-quality retrieval stages and the impact of different configurations on the quality of generated summaries.
Paper Structure (15 sections, 1 equation, 6 figures, 1 table)

This paper contains 15 sections, 1 equation, 6 figures, 1 table.

Figures (6)

  • Figure 1: Dataset Creation Process.
  • Figure 2: Three stages to generate a section of a survey.
  • Figure 3: Prompt used to write the section of the survey in pipeline 1.
  • Figure 4: Prompt used to rerank the chunks in pipeline 2.
  • Figure 5: Prompt used to write the section of the survey in pipeline 2.
  • ...and 1 more figures