An Empirical Analysis of Diversity in Argument Summarization

Michiel van der Meer; Piek Vossen; Catholijn M. Jonker; Pradeep K. Murukannaiah

An Empirical Analysis of Diversity in Argument Summarization

Michiel van der Meer, Piek Vossen, Catholijn M. Jonker, Pradeep K. Murukannaiah

TL;DR

This paper investigates diversity in argument summarization by examining three facets—diversity of opinions, annotators, and data sources—within the Key Point Analysis (KPA) framework. It benchmarks prompt-based LLMs (e.g., ChatGPT in open-book and closed-book modes) and a dedicated KPA model (SMatchToPR/Debater) across ArgKP, Perspectrum, and PVE datasets, focusing on long-tail opinions, annotator disagreement, and cross-source transfer. The findings show that LLMs excel at generating key points but struggle to reliably match arguments to key points, while dedicated models can outperform in certain datasets but do not dominate across tasks; training data diversification improves generalization. The study highlights the need for diversity-aware strategies in practical argument summarization and outlines ethical and methodological considerations for deploying such systems in sensitive contexts.

Abstract

Presenting high-level arguments is a crucial task for fostering participation in online societal discussions. Current argument summarization approaches miss an important facet of this task -- capturing diversity -- which is important for accommodating multiple perspectives. We introduce three aspects of diversity: those of opinions, annotators, and sources. We evaluate approaches to a popular argument summarization task called Key Point Analysis, which shows how these approaches struggle to (1) represent arguments shared by few people, (2) deal with data from various sources, and (3) align with subjectivity in human-provided annotations. We find that both general-purpose LLMs and dedicated KPA models exhibit this behavior, but have complementary strengths. Further, we observe that diversification of training data may ameliorate generalization. Addressing diversity in argument summarization requires a mix of strategies to deal with subjectivity.

An Empirical Analysis of Diversity in Argument Summarization

TL;DR

Abstract

Paper Structure (39 sections, 2 equations, 4 figures, 16 tables)

This paper contains 39 sections, 2 equations, 4 figures, 16 tables.

Introduction
Contributions
Related Work
Key Point Analysis
Opinion Summarization
Diversity in Societal Decision Making
Method
Task setup
Modeling Diversity in Key Point Analysis
(1) Long tail opinions
(2) Annotators
(3) Data sources
Experimental Setup
Data
Approaches
...and 24 more sections

Figures (4)

Figure 1: KPM performance when limiting data usage to a fraction $f$, starting with long tail first.
Figure 2: KPM performance for all approaches on the different data sources in Perspectrum.
Figure 3: Number of arguments matched per claim (upper row) and key point (bottom row), sorted by frequency. The red dashed line shows the average number of arguments.
Figure 4: KPG performance when limiting data usage to a fraction $f$, starting with the long tail first.

An Empirical Analysis of Diversity in Argument Summarization

TL;DR

Abstract

An Empirical Analysis of Diversity in Argument Summarization

Authors

TL;DR

Abstract

Table of Contents

Figures (4)