NLAS-multi: A Multilingual Corpus of Automatically Generated Natural Language Argumentation Schemes
Ramon Ruiz-Dolz, Joaquin Taverner, John Lawrence, Chris Reed
TL;DR
The paper tackles multilingual argument mining limitations by automatically generating a large, Walton-based corpus of natural language argumentation schemes using prompt-based LLMs. It introduces NLAS-multi, the largest publicly available corpus of 20 Walton schemes instantiated over 50 topics in English and Spanish, generated via a two-stage GPT-3.5-turbo and GPT-4 workflow with human validation. It also provides strong baselines for automatic classification of NLAS using RoBERTa-family models in monolingual and multilingual settings, achieving high macro-F1 scores. The combined resource and models enable scalable, cross-language analysis of argumentation schemes and open pathways for more advanced argument mining tasks.
Abstract
Some of the major limitations identified in the areas of argument mining, argument generation, and natural language argument analysis are related to the complexity of annotating argumentatively rich data, the limited size of these corpora, and the constraints that represent the different languages and domains in which these data is annotated. To address these limitations, in this paper we present the following contributions: (i) an effective methodology for the automatic generation of natural language arguments in different topics and languages, (ii) the largest publicly available corpus of natural language argumentation schemes, and (iii) a set of solid baselines and fine-tuned models for the automatic identification of argumentation schemes.
