An Empirical Analysis of Diversity in Argument Summarization
Michiel van der Meer, Piek Vossen, Catholijn M. Jonker, Pradeep K. Murukannaiah
TL;DR
This paper investigates diversity in argument summarization by examining three facets—diversity of opinions, annotators, and data sources—within the Key Point Analysis (KPA) framework. It benchmarks prompt-based LLMs (e.g., ChatGPT in open-book and closed-book modes) and a dedicated KPA model (SMatchToPR/Debater) across ArgKP, Perspectrum, and PVE datasets, focusing on long-tail opinions, annotator disagreement, and cross-source transfer. The findings show that LLMs excel at generating key points but struggle to reliably match arguments to key points, while dedicated models can outperform in certain datasets but do not dominate across tasks; training data diversification improves generalization. The study highlights the need for diversity-aware strategies in practical argument summarization and outlines ethical and methodological considerations for deploying such systems in sensitive contexts.
Abstract
Presenting high-level arguments is a crucial task for fostering participation in online societal discussions. Current argument summarization approaches miss an important facet of this task -- capturing diversity -- which is important for accommodating multiple perspectives. We introduce three aspects of diversity: those of opinions, annotators, and sources. We evaluate approaches to a popular argument summarization task called Key Point Analysis, which shows how these approaches struggle to (1) represent arguments shared by few people, (2) deal with data from various sources, and (3) align with subjectivity in human-provided annotations. We find that both general-purpose LLMs and dedicated KPA models exhibit this behavior, but have complementary strengths. Further, we observe that diversification of training data may ameliorate generalization. Addressing diversity in argument summarization requires a mix of strategies to deal with subjectivity.
