Table of Contents
Fetching ...

A Systematic Survey of Text Summarization: From Statistical Methods to Large Language Models

Haopeng Zhang, Philip S. Yu, Jiawei Zhang

TL;DR

This survey surveys the evolution of text summarization across four paradigm shifts—statistical methods, deep learning, pre-trained language models, and the current large language model era—and provides a unified view of datasets, metrics, and methods. It presents a two-part analysis: (i) pre-LLM summarization, covering traditional statistical, neural, and PLM-fine-tuning approaches, and (ii) the LLM-era landscape, including benchmarking, modeling, and evaluation studies with a new taxonomy for LLM-based summarization. The work offers a comprehensive taxonomy, tabulates representative methods and datasets, and discusses trends, open challenges, and future directions, highlighting issues such as hallucination, bias, efficiency, personalization, and interpretability. By synthesizing prior methods with contemporary LLM-based techniques, the paper guides researchers and practitioners in advancing robust, faithful, and domain-aware summarization systems in real-world applications.

Abstract

Text summarization research has undergone several significant transformations with the advent of deep neural networks, pre-trained language models (PLMs), and recent large language models (LLMs). This survey thus provides a comprehensive review of the research progress and evolution in text summarization through the lens of these paradigm shifts. It is organized into two main parts: (1) a detailed overview of datasets, evaluation metrics, and summarization methods before the LLM era, encompassing traditional statistical methods, deep learning approaches, and PLM fine-tuning techniques, and (2) the first detailed examination of recent advancements in benchmarking, modeling, and evaluating summarization in the LLM era. By synthesizing existing literature and presenting a cohesive overview, this survey also discusses research trends, open challenges, and proposes promising research directions in summarization, aiming to guide researchers through the evolving landscape of summarization research.

A Systematic Survey of Text Summarization: From Statistical Methods to Large Language Models

TL;DR

This survey surveys the evolution of text summarization across four paradigm shifts—statistical methods, deep learning, pre-trained language models, and the current large language model era—and provides a unified view of datasets, metrics, and methods. It presents a two-part analysis: (i) pre-LLM summarization, covering traditional statistical, neural, and PLM-fine-tuning approaches, and (ii) the LLM-era landscape, including benchmarking, modeling, and evaluation studies with a new taxonomy for LLM-based summarization. The work offers a comprehensive taxonomy, tabulates representative methods and datasets, and discusses trends, open challenges, and future directions, highlighting issues such as hallucination, bias, efficiency, personalization, and interpretability. By synthesizing prior methods with contemporary LLM-based techniques, the paper guides researchers and practitioners in advancing robust, faithful, and domain-aware summarization systems in real-world applications.

Abstract

Text summarization research has undergone several significant transformations with the advent of deep neural networks, pre-trained language models (PLMs), and recent large language models (LLMs). This survey thus provides a comprehensive review of the research progress and evolution in text summarization through the lens of these paradigm shifts. It is organized into two main parts: (1) a detailed overview of datasets, evaluation metrics, and summarization methods before the LLM era, encompassing traditional statistical methods, deep learning approaches, and PLM fine-tuning techniques, and (2) the first detailed examination of recent advancements in benchmarking, modeling, and evaluating summarization in the LLM era. By synthesizing existing literature and presenting a cohesive overview, this survey also discusses research trends, open challenges, and proposes promising research directions in summarization, aiming to guide researchers through the evolving landscape of summarization research.
Paper Structure (71 sections, 2 equations, 8 figures, 6 tables)

This paper contains 71 sections, 2 equations, 8 figures, 6 tables.

Figures (8)

  • Figure 1: The Evolution of the Four Major Paradigms in Text Summarization Research.
  • Figure 2: Categorization of Summarization Approaches based on input formats and output styles.
  • Figure 3: Example of Abstractive and Extractive Summaries for a News Article from CNN/Dailymail.
  • Figure 4: Taxonomy of Representative Summarization Methods prior to LLMs.
  • Figure 5: Left: Illustration of representing a document consisting of 6 sentences $\{s1, ..., s6\}$ as a graph. Each node represents one sentence and the edge weights represent sentence similarities. Right: Illustration of graph attention by node 2 on its neighborhood. Here $\alpha_{ij}$ denotes the normalized attention scores between node i and node j.
  • ...and 3 more figures