Table of Contents
Fetching ...

Can Large Language Models perform Relation-based Argument Mining?

Deniz Gorur, Antonio Rago, Francesca Toni

TL;DR

This study investigates binary relation-based argument mining (RbAM) using open-source Large Language Models (LLMs) with simple primed prompting, eliminating the need for fine-tuning. Across ten datasets, LLMs such as Llama 70B-4bit and Mixtral 8x7B-4bit outperform a strong RoBERTa baseline, achieving a macro F1 of up to about 75 and demonstrating cross-dataset robustness. The work highlights the practicality of prompt-based LLMs for cross-domain RbAM, while also outlining trade-offs in inference speed and hardware demands. It also points to future directions, including moving to ternary RbAM and improving attack-prediction accuracy, with potential downstream benefits for online debate platforms and evidence-gathering tasks.

Abstract

Argument mining (AM) is the process of automatically extracting arguments, their components and/or relations amongst arguments and components from text. As the number of platforms supporting online debate increases, the need for AM becomes ever more urgent, especially in support of downstream tasks. Relation-based AM (RbAM) is a form of AM focusing on identifying agreement (support) and disagreement (attack) relations amongst arguments. RbAM is a challenging classification task, with existing methods failing to perform satisfactorily. In this paper, we show that general-purpose Large Language Models (LLMs), appropriately primed and prompted, can significantly outperform the best performing (RoBERTa-based) baseline. Specifically, we experiment with two open-source LLMs (Llama-2 and Mistral) with ten datasets.

Can Large Language Models perform Relation-based Argument Mining?

TL;DR

This study investigates binary relation-based argument mining (RbAM) using open-source Large Language Models (LLMs) with simple primed prompting, eliminating the need for fine-tuning. Across ten datasets, LLMs such as Llama 70B-4bit and Mixtral 8x7B-4bit outperform a strong RoBERTa baseline, achieving a macro F1 of up to about 75 and demonstrating cross-dataset robustness. The work highlights the practicality of prompt-based LLMs for cross-domain RbAM, while also outlining trade-offs in inference speed and hardware demands. It also points to future directions, including moving to ternary RbAM and improving attack-prediction accuracy, with potential downstream benefits for online debate platforms and evidence-gathering tasks.

Abstract

Argument mining (AM) is the process of automatically extracting arguments, their components and/or relations amongst arguments and components from text. As the number of platforms supporting online debate increases, the need for AM becomes ever more urgent, especially in support of downstream tasks. Relation-based AM (RbAM) is a form of AM focusing on identifying agreement (support) and disagreement (attack) relations amongst arguments. RbAM is a challenging classification task, with existing methods failing to perform satisfactorily. In this paper, we show that general-purpose Large Language Models (LLMs), appropriately primed and prompted, can significantly outperform the best performing (RoBERTa-based) baseline. Specifically, we experiment with two open-source LLMs (Llama-2 and Mistral) with ten datasets.
Paper Structure (18 sections, 9 figures, 5 tables)

This paper contains 18 sections, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Experimental pipeline with the (few-shot learning) primer and the prompt template P(A,B) .
  • Figure 2: An example prompt drawn from the Essays dataset used in the RbAM experiments.
  • Figure 3: An example prompt drawn from the Microtexts dataset used in the RbAM experiments.
  • Figure 4: An example prompt drawn from the Nixon-Kennedy dataset used in the RbAM experiments.
  • Figure 5: An example prompt drawn from the Debatepedia/Procon dataset used in the RbAM experiments.
  • ...and 4 more figures