Table of Contents
Fetching ...

Uncovering Agendas: A Novel French & English Dataset for Agenda Detection on Social Media

Gregorios Katsios, Ning Sa, Ankita Bhaumik, Tomek Strzalkowski

TL;DR

This work tackles agenda-detection on social media with limited labeled data by reframing the task as textual entailment. It pre-trains models on standard NLI datasets and fine-tunes them on a bilingual agenda dataset derived from tweets about the 2022 French elections, generating hypotheses from agenda labels and interpreting entailment signals as multi-label outputs. The study demonstrates that textual-entailment approaches, especially multilingual T5-based models with RTE pre-training, outperform traditional classification and zero-shot baselines, achieving robust results in a low-resource, multilingual setting. The dataset and code are released to enable further research, supporting rapid detection of emergent influence campaigns across languages and media.

Abstract

The behavior and decision making of groups or communities can be dramatically influenced by individuals pushing particular agendas, e.g., to promote or disparage a person or an activity, to call for action, etc.. In the examination of online influence campaigns, particularly those related to important political and social events, scholars often concentrate on identifying the sources responsible for setting and controlling the agenda (e.g., public media). In this article we present a methodology for detecting specific instances of agenda control through social media where annotated data is limited or non-existent. By using a modest corpus of Twitter messages centered on the 2022 French Presidential Elections, we carry out a comprehensive evaluation of various approaches and techniques that can be applied to this problem. Our findings demonstrate that by treating the task as a textual entailment problem, it is possible to overcome the requirement for a large annotated training dataset.

Uncovering Agendas: A Novel French & English Dataset for Agenda Detection on Social Media

TL;DR

This work tackles agenda-detection on social media with limited labeled data by reframing the task as textual entailment. It pre-trains models on standard NLI datasets and fine-tunes them on a bilingual agenda dataset derived from tweets about the 2022 French elections, generating hypotheses from agenda labels and interpreting entailment signals as multi-label outputs. The study demonstrates that textual-entailment approaches, especially multilingual T5-based models with RTE pre-training, outperform traditional classification and zero-shot baselines, achieving robust results in a low-resource, multilingual setting. The dataset and code are released to enable further research, supporting rapid detection of emergent influence campaigns across languages and media.

Abstract

The behavior and decision making of groups or communities can be dramatically influenced by individuals pushing particular agendas, e.g., to promote or disparage a person or an activity, to call for action, etc.. In the examination of online influence campaigns, particularly those related to important political and social events, scholars often concentrate on identifying the sources responsible for setting and controlling the agenda (e.g., public media). In this article we present a methodology for detecting specific instances of agenda control through social media where annotated data is limited or non-existent. By using a modest corpus of Twitter messages centered on the 2022 French Presidential Elections, we carry out a comprehensive evaluation of various approaches and techniques that can be applied to this problem. Our findings demonstrate that by treating the task as a textual entailment problem, it is possible to overcome the requirement for a large annotated training dataset.
Paper Structure (30 sections, 2 figures, 8 tables)

This paper contains 30 sections, 2 figures, 8 tables.

Figures (2)

  • Figure 1: Confusion matrix of agenda-rte-bi-mT5 French results. The extra labels are in the bottom row and the missed labels are in the rightmost column.
  • Figure 2: Confusion matrix for agenda-rte-bi-mT5 French results on Run 1. The extra labels are in the bottom row and the missed labels are in the rightmost column.