IDIAPers @ Causal News Corpus 2022: Efficient Causal Relation Identification Through a Prompt-based Few-shot Approach
Sergio Burdisso, Juan Zuluaga-Gomez, Esau Villatoro-Tello, Martin Fajcik, Muskaan Singh, Pavel Smrz, Petr Motlicek
TL;DR
This work tackles causal relation identification in news text under CASE-2022 by framing the task as masked language modeling in a few-shot, prompt-based setup. It demonstrates that a model fine-tuned with as few as $k$ per class (e.g., $k=256$) and augmented prompts can achieve competitive precision, accuracy, and F1 compared to ensemble methods trained on all available data, reducing data requirements to about $15.7\%$ and enabling robust model selection. Compared with ensemble baselines, the prompt-based approach generalizes well to the official test set, achieving strong scores (e.g., a near-top F1 around $0.86$) while maintaining higher data-efficiency. The study also analyzes dataset properties and ensemble strategies, highlighting the practical value of prompt-based few-shot methods for CRI in low-resource settings and rapid evaluation. Overall, the work suggests that prompt-based MLM with demonstrations offers a viable, efficient alternative for causal event classification in NLP applications.
Abstract
In this paper, we describe our participation in the subtask 1 of CASE-2022, Event Causality Identification with Casual News Corpus. We address the Causal Relation Identification (CRI) task by exploiting a set of simple yet complementary techniques for fine-tuning language models (LMs) on a small number of annotated examples (i.e., a few-shot configuration). We follow a prompt-based prediction approach for fine-tuning LMs in which the CRI task is treated as a masked language modeling problem (MLM). This approach allows LMs natively pre-trained on MLM problems to directly generate textual responses to CRI-specific prompts. We compare the performance of this method against ensemble techniques trained on the entire dataset. Our best-performing submission was fine-tuned with only 256 instances per class, 15.7% of the all available data, and yet obtained the second-best precision (0.82), third-best accuracy (0.82), and an F1-score (0.85) very close to what was reported by the winner team (0.86).
