Uncovering Agendas: A Novel French & English Dataset for Agenda Detection on Social Media
Gregorios Katsios, Ning Sa, Ankita Bhaumik, Tomek Strzalkowski
TL;DR
This work tackles agenda-detection on social media with limited labeled data by reframing the task as textual entailment. It pre-trains models on standard NLI datasets and fine-tunes them on a bilingual agenda dataset derived from tweets about the 2022 French elections, generating hypotheses from agenda labels and interpreting entailment signals as multi-label outputs. The study demonstrates that textual-entailment approaches, especially multilingual T5-based models with RTE pre-training, outperform traditional classification and zero-shot baselines, achieving robust results in a low-resource, multilingual setting. The dataset and code are released to enable further research, supporting rapid detection of emergent influence campaigns across languages and media.
Abstract
The behavior and decision making of groups or communities can be dramatically influenced by individuals pushing particular agendas, e.g., to promote or disparage a person or an activity, to call for action, etc.. In the examination of online influence campaigns, particularly those related to important political and social events, scholars often concentrate on identifying the sources responsible for setting and controlling the agenda (e.g., public media). In this article we present a methodology for detecting specific instances of agenda control through social media where annotated data is limited or non-existent. By using a modest corpus of Twitter messages centered on the 2022 French Presidential Elections, we carry out a comprehensive evaluation of various approaches and techniques that can be applied to this problem. Our findings demonstrate that by treating the task as a textual entailment problem, it is possible to overcome the requirement for a large annotated training dataset.
