Recent Trends in Unsupervised Summarization
Mohammad Khosravani, Amine Trabelsi
TL;DR
Unsupervised summarization addresses the problem of producing concise, informative summaries without labeled data. The paper presents a fine-grained taxonomy separating abstractive, extractive, and hybrid methods, and surveys advances across language-model-based generation, reconstruction-based training, and data-driven fine-tuning, including weakly/self-supervised and few-shot approaches. It surveys datasets and evaluation methods, analyzes trends and limitations, and discusses practical concerns such as the cost and reliability of large language models, as well as the challenges of long-/multi-document summarization. The work serves as a comprehensive reference for researchers to understand current techniques, datasets, and evaluation practices, and to identify promising directions for scalable and domain-adaptive unsupervised summarization.
Abstract
Unsupervised summarization is a powerful technique that enables training summarizing models without requiring labeled datasets. This survey covers different recent techniques and models used for unsupervised summarization. We cover extractive, abstractive, and hybrid models and strategies used to achieve unsupervised summarization. While the main focus of this survey is on recent research, we also cover some of the important previous research. We additionally introduce a taxonomy, classifying different research based on their approach to unsupervised training. Finally, we discuss the current approaches and mention some datasets and evaluation methods.
