Argument Mining in Data Scarce Settings: Cross-lingual Transfer and Few-shot Techniques
Anar Yeginbergen, Maite Oronoz, Rodrigo Agerri
TL;DR
This study tackles data scarcity in Argument Mining by systematically comparing data-transfer, model-transfer, and few-shot prompting on a multilingual medical-abstract corpus derived from AbstRCT. Contrary to prior cross-lingual findings on other sequence-labeling tasks, data-transfer—especially monolingual—delivers the strongest performance, with multilingual data-transfer providing the best overall results. Few-shot prompting via EntLM is generally outperformed by both data-transfer and full fine-tuning, though fine-tuning with around $20\%$ of the data can be competitive. The work highlights that the length and complexity of argument spans and the data-sampling strategy crucially influence few-shot effectiveness and calls for broader domain validation to establish generalizability.
Abstract
Recent research on sequence labelling has been exploring different strategies to mitigate the lack of manually annotated data for the large majority of the world languages. Among others, the most successful approaches have been based on (i) the cross-lingual transfer capabilities of multilingual pre-trained language models (model-transfer), (ii) data translation and label projection (data-transfer) and (iii), prompt-based learning by reusing the mask objective to exploit the few-shot capabilities of pre-trained language models (few-shot). Previous work seems to conclude that model-transfer outperforms data-transfer methods and that few-shot techniques based on prompting are superior to updating the model's weights via fine-tuning. In this paper, we empirically demonstrate that, for Argument Mining, a sequence labelling task which requires the detection of long and complex discourse structures, previous insights on cross-lingual transfer or few-shot learning do not apply. Contrary to previous work, we show that for Argument Mining data transfer obtains better results than model-transfer and that fine-tuning outperforms few-shot methods. Regarding the former, the domain of the dataset used for data-transfer seems to be a deciding factor, while, for few-shot, the type of task (length and complexity of the sequence spans) and sampling method prove to be crucial.
