Diversity Over Size: On the Effect of Sample and Topic Sizes for Topic-Dependent Argument Mining Datasets

Benjamin Schiller; Johannes Daxenberger; Andreas Waldis; Iryna Gurevych

Diversity Over Size: On the Effect of Sample and Topic Sizes for Topic-Dependent Argument Mining Datasets

Benjamin Schiller, Johannes Daxenberger, Andreas Waldis, Iryna Gurevych

TL;DR

This work investigates the effect of TDAM dataset composition in few- and zero-shot settings and shows that, while fine-tuning is mandatory to achieve acceptable model performance, using carefully composed training samples and reducing the training sample size can still yield 95% of the maximum performance.

Abstract

The task of Argument Mining, that is extracting and classifying argument components for a specific topic from large document sources, is an inherently difficult task for machine learning models and humans alike, as large Argument Mining datasets are rare and recognition of argument components requires expert knowledge. The task becomes even more difficult if it also involves stance detection of retrieved arguments. In this work, we investigate the effect of Argument Mining dataset composition in few- and zero-shot settings. Our findings show that, while fine-tuning is mandatory to achieve acceptable model performance, using carefully composed training samples and reducing the training sample size by up to almost 90% can still yield 95% of the maximum performance. This gain is consistent across three Argument Mining tasks on three different datasets. We also publish a new dataset for future benchmarking.

Diversity Over Size: On the Effect of Sample and Topic Sizes for Topic-Dependent Argument Mining Datasets

TL;DR

Abstract

Paper Structure (27 sections, 7 figures, 8 tables)

This paper contains 27 sections, 7 figures, 8 tables.

Introduction
Related Work
Data
ukpc
fsc
iam
ibm
Method
Sample experiments
Topic experiments
Dataset experiments
Models
ERNIE 2.0
FLAN-T5 XL
LLama2-70B, ChatGPT
...and 12 more sections

Figures (7)

Figure 1: Sample experiments on the fsc
Figure 2: Sample experiments on the iam
Figure 3: Sample experiments on the ibm
Figure 4: Topic experiments for FS150T-/IAM- and IBM-Corpus on ERNIE 2.0 and FLAN-T5 XL and in F${_1}$ macro.
Figure 5: Sample experiments for FS150T-/IAM- and IBM-Corpus on ERNIE 2.0, FLAN-T5 XL, Llama2-70B, and ChatGPT in F${_1}$ macro and with standard deviation.
...and 2 more figures

Diversity Over Size: On the Effect of Sample and Topic Sizes for Topic-Dependent Argument Mining Datasets

TL;DR

Abstract

Diversity Over Size: On the Effect of Sample and Topic Sizes for Topic-Dependent Argument Mining Datasets

Authors

TL;DR

Abstract

Table of Contents

Figures (7)