Contextualizing Argument Quality Assessment with Relevant Knowledge

Darshan Deshpande; Zhivar Sourati; Filip Ilievski; Fred Morstatter

Contextualizing Argument Quality Assessment with Relevant Knowledge

Darshan Deshpande, Zhivar Sourati, Filip Ilievski, Fred Morstatter

TL;DR

This work tackles argument quality assessment by contextualizing arguments with relevant knowledge. SPARK employs four LLM-driven augmentations (Feedback, Assumptions, Similar-quality, Counter-arguments) within a dual-encoder Transformer that jointly scores cogency, effectiveness, and reasonableness. Empirical results show SPARK outperforms strong baselines in both in-domain and zero-shot settings, with augmentations contributing complementary benefits and aligning variably with human judgments. The approach advances robust, context-aware argument analysis, though it faces challenges from LLM hallucinations and biases, motivating future work on broader domains and hybrid symbolic techniques.

Abstract

Automatic assessment of the quality of arguments has been recognized as a challenging task with significant implications for misinformation and targeted speech. While real-world arguments are tightly anchored in context, existing computational methods analyze their quality in isolation, which affects their accuracy and generalizability. We propose SPARK: a novel method for scoring argument quality based on contextualization via relevant knowledge. We devise four augmentations that leverage large language models to provide feedback, infer hidden assumptions, supply a similar-quality argument, or give a counter-argument. SPARK uses a dual-encoder Transformer architecture to enable the original argument and its augmentation to be considered jointly. Our experiments in both in-domain and zero-shot setups show that SPARK consistently outperforms existing techniques across multiple metrics.

Contextualizing Argument Quality Assessment with Relevant Knowledge

TL;DR

Abstract

Paper Structure (26 sections, 3 figures, 6 tables)

This paper contains 26 sections, 3 figures, 6 tables.

Introduction
Background
SPARK
Augmentation strategies.
Dual-encoder architecture.
Experiments
Baselines
Scoring models.
Alternative augmentation strategies.
Datasets and Evaluation
Results
Effect of augmentation on in-domain performance (Q1).
Comparison of augmentation variants (Q2).
Human judgment of augmentations (Q3).
How augmentations affect quality scores (Q4)
...and 11 more sections

Figures (3)

Figure 1: Overview of SPARK.
Figure 2: Dual BERT encoder architecture.
Figure 3: Distributions for augmentation lengths for the training, validation, and testing splits, respectively.

Contextualizing Argument Quality Assessment with Relevant Knowledge

TL;DR

Abstract

Contextualizing Argument Quality Assessment with Relevant Knowledge

Authors

TL;DR

Abstract

Table of Contents

Figures (3)