Table of Contents
Fetching ...

ACLSum: A New Dataset for Aspect-based Summarization of Scientific Publications

Sotaro Takeshita, Tommaso Green, Ines Reinig, Kai Eckert, Simone Paolo Ponzetto

TL;DR

ACLSum introduces an expert-annotated, multi-aspect summarization dataset for scholarly NLP publications, enabling extractive and abstractive summaries across Challenge, Approach, and Outcome. The authors compare extract-then-abstract and end-to-end paradigms, and evaluate Llama 2 with instruction tuning and chain-of-thought-style training, alongside a greedy heuristic for extractive label induction. They show end-to-end abstractive methods generally outperform extract-then-abstract pipelines, particularly when high-quality extractive supervision is feasible, and reveal limitations of heuristic extractive labeling. The dataset and findings establish a concrete benchmark for scholarly text summarization and motivate future cross-domain, multilingual, and multi-document extensions.

Abstract

Extensive efforts in the past have been directed toward the development of summarization datasets. However, a predominant number of these resources have been (semi)-automatically generated, typically through web data crawling, resulting in subpar resources for training and evaluating summarization systems, a quality compromise that is arguably due to the substantial costs associated with generating ground-truth summaries, particularly for diverse languages and specialized domains. To address this issue, we present ACLSum, a novel summarization dataset carefully crafted and evaluated by domain experts. In contrast to previous datasets, ACLSum facilitates multi-aspect summarization of scientific papers, covering challenges, approaches, and outcomes in depth. Through extensive experiments, we evaluate the quality of our resource and the performance of models based on pretrained language models and state-of-the-art large language models (LLMs). Additionally, we explore the effectiveness of extractive versus abstractive summarization within the scholarly domain on the basis of automatically discovered aspects. Our results corroborate previous findings in the general domain and indicate the general superiority of end-to-end aspect-based summarization. Our data is released at https://github.com/sobamchan/aclsum.

ACLSum: A New Dataset for Aspect-based Summarization of Scientific Publications

TL;DR

ACLSum introduces an expert-annotated, multi-aspect summarization dataset for scholarly NLP publications, enabling extractive and abstractive summaries across Challenge, Approach, and Outcome. The authors compare extract-then-abstract and end-to-end paradigms, and evaluate Llama 2 with instruction tuning and chain-of-thought-style training, alongside a greedy heuristic for extractive label induction. They show end-to-end abstractive methods generally outperform extract-then-abstract pipelines, particularly when high-quality extractive supervision is feasible, and reveal limitations of heuristic extractive labeling. The dataset and findings establish a concrete benchmark for scholarly text summarization and motivate future cross-domain, multilingual, and multi-document extensions.

Abstract

Extensive efforts in the past have been directed toward the development of summarization datasets. However, a predominant number of these resources have been (semi)-automatically generated, typically through web data crawling, resulting in subpar resources for training and evaluating summarization systems, a quality compromise that is arguably due to the substantial costs associated with generating ground-truth summaries, particularly for diverse languages and specialized domains. To address this issue, we present ACLSum, a novel summarization dataset carefully crafted and evaluated by domain experts. In contrast to previous datasets, ACLSum facilitates multi-aspect summarization of scientific papers, covering challenges, approaches, and outcomes in depth. Through extensive experiments, we evaluate the quality of our resource and the performance of models based on pretrained language models and state-of-the-art large language models (LLMs). Additionally, we explore the effectiveness of extractive versus abstractive summarization within the scholarly domain on the basis of automatically discovered aspects. Our results corroborate previous findings in the general domain and indicate the general superiority of end-to-end aspect-based summarization. Our data is released at https://github.com/sobamchan/aclsum.
Paper Structure (45 sections, 5 figures, 10 tables)

This paper contains 45 sections, 5 figures, 10 tables.

Figures (5)

  • Figure 1: A data sample from ACLSum. Each document is complemented with manually-crafted and validated summaries for both extractive and abstractive setups on three different aspects. We annotate aspects to be used as extractive summaries.
  • Figure 2: Relative positions of relevant sentences for each aspect (Challenge, Approach and Outcome).
  • Figure 3: Average ROUGE scores between aspect-annotated sentences and abstractive summaries.
  • Figure 4: Maximum and average similarity of each aspect-annotated sentence to the centroid of sentences for that aspect using Sentence-T5 embeddings.
  • Figure 5: Screenshot of annotation procedure with INCEpTION.