SCRum-9: Multilingual Stance Classification over Rumours on Social Media
Yue Li, Jake Vasilakes, Zhixue Zhao, Carolina Scarton
TL;DR
SCRum-9 introduces the largest multilingual rumour stance classification benchmark to date, spanning 9 languages with 7,516 tweet–reply pairs linked to 2,156 fact-checked rumours and annotated with confidence and second-choice labels to capture annotator uncertainty. The work provides a comprehensive data collection and annotation protocol, including topic-based filtering and a two-round adjudication process, and benchmarks both LLM-based in-context learning and multilingual MLM fine-tuning, augmented by multilingual synthetic data generated by LLMs. Key findings show substantial cross-language variation in ICL performance, with translation and few-shot demonstrations often helping non-English cases, and that synthetic multilingual data can power MLMs to competitive or superior performance while reducing compute costs. SCRum-9 offers new avenues for multilingual rumour analysis, uncertainty studies, and downstream tasks such as claim verification, with public release to spur further research.
Abstract
We introduce SCRum-9, the largest multilingual Stance Classification dataset for Rumour analysis in 9 languages, containing 7,516 tweets from X. SCRum-9 goes beyond existing stance classification datasets by covering more languages, linking examples to more fact-checked claims (2.1k), and including confidence-related annotations from multiple annotators to account for intra- and inter-annotator variability. Annotations were made by at least two native speakers per language, totalling more than 405 hours of annotation and 8,150 dollars in compensation. Further, SCRum-9 is used to benchmark five large language models (LLMs) and two multilingual masked language models (MLMs) in In-Context Learning (ICL) and fine-tuning setups. This paper also innovates by exploring the use of multilingual synthetic data for rumour stance classification, showing that even LLMs with weak ICL performance can produce valuable synthetic data for fine-tuning small MLMs, enabling them to achieve higher performance than zero-shot ICL in LLMs. Finally, we examine the relationship between model predictions and human uncertainty on ambiguous cases finding that model predictions often match the second-choice labels assigned by annotators, rather than diverging entirely from human judgments. SCRum-9 is publicly released to the research community with potential to foster further research on multilingual analysis of misleading narratives on social media.
