Table of Contents
Fetching ...

Towards a Perspectivist Turn in Argument Quality Assessment

Julia Romberg, Maximilian Maurer, Henning Wachsmuth, Gabriella Lapesa

TL;DR

The paper addresses the subjectivity of argument quality (AQ) assessment and promotes a perspectivist turn to accommodate multiple valid perspectives. It conducts a systematic review of 103 AQ datasets, developing taxonomies for what is annotated and who annotates, and provides a comprehensive meta-information database, including a deep dive into 24 non-aggregated datasets. Through a pilot analysis of disagreement patterns and cross-group annotation experiments, the authors reveal substantial within- and across-group variation and limited cross-group transfer for aggregated labels, while perspectivist distributions can reveal systematic group-specific patterns. The work establishes resource-rich foundations for perspectivist AQ research, emphasizing diverse annotator attributes, transparency, and large-scale non-aggregated data collection as prerequisites for fairer, more robust AQ modeling and evaluation.

Abstract

The assessment of argument quality depends on well-established logical, rhetorical, and dialectical properties that are unavoidably subjective: multiple valid assessments may exist, there is no unequivocal ground truth. This aligns with recent paths in machine learning, which embrace the co-existence of different perspectives. However, this potential remains largely unexplored in NLP research on argument quality. One crucial reason seems to be the yet unexplored availability of suitable datasets. We fill this gap by conducting a systematic review of argument quality datasets. We assign them to a multi-layered categorization targeting two aspects: (a) What has been annotated: we collect the quality dimensions covered in datasets and consolidate them in an overarching taxonomy, increasing dataset comparability and interoperability. (b) Who annotated: we survey what information is given about annotators, enabling perspectivist research and grounding our recommendations for future actions. To this end, we discuss datasets suitable for developing perspectivist models (i.e., those containing individual, non-aggregated annotations), and we showcase the importance of a controlled selection of annotators in a pilot study.

Towards a Perspectivist Turn in Argument Quality Assessment

TL;DR

The paper addresses the subjectivity of argument quality (AQ) assessment and promotes a perspectivist turn to accommodate multiple valid perspectives. It conducts a systematic review of 103 AQ datasets, developing taxonomies for what is annotated and who annotates, and provides a comprehensive meta-information database, including a deep dive into 24 non-aggregated datasets. Through a pilot analysis of disagreement patterns and cross-group annotation experiments, the authors reveal substantial within- and across-group variation and limited cross-group transfer for aggregated labels, while perspectivist distributions can reveal systematic group-specific patterns. The work establishes resource-rich foundations for perspectivist AQ research, emphasizing diverse annotator attributes, transparency, and large-scale non-aggregated data collection as prerequisites for fairer, more robust AQ modeling and evaluation.

Abstract

The assessment of argument quality depends on well-established logical, rhetorical, and dialectical properties that are unavoidably subjective: multiple valid assessments may exist, there is no unequivocal ground truth. This aligns with recent paths in machine learning, which embrace the co-existence of different perspectives. However, this potential remains largely unexplored in NLP research on argument quality. One crucial reason seems to be the yet unexplored availability of suitable datasets. We fill this gap by conducting a systematic review of argument quality datasets. We assign them to a multi-layered categorization targeting two aspects: (a) What has been annotated: we collect the quality dimensions covered in datasets and consolidate them in an overarching taxonomy, increasing dataset comparability and interoperability. (b) Who annotated: we survey what information is given about annotators, enabling perspectivist research and grounding our recommendations for future actions. To this end, we discuss datasets suitable for developing perspectivist models (i.e., those containing individual, non-aggregated annotations), and we showcase the importance of a controlled selection of annotators in a pilot study.

Paper Structure

This paper contains 63 sections, 2 figures, 5 tables.

Figures (2)

  • Figure 1: Frequency and distribution of AQ categories (major and sub-categories) as assigned to datasets, grouped by the four major categories and overall AQ.
  • Figure 2: Instance-based aggregation of label decisions for overall AQ, assessed on a scale from 1 (low) to 3 (high), between two annotator groups on Dagstuhl, with a fitted linear regression model highlighting their relationship.