Stanceosaurus 2.0: Classifying Stance Towards Russian and Spanish Misinformation

Anton Lavrouk; Ian Ligon; Tarek Naous; Jonathan Zheng; Alan Ritter; Wei Xu

Stanceosaurus 2.0: Classifying Stance Towards Russian and Spanish Misinformation

Anton Lavrouk, Ian Ligon, Tarek Naous, Jonathan Zheng, Alan Ritter, Wei Xu

TL;DR

Stanceosaurus 2.0 extends a 5-way stance dataset to Russian and Spanish, enabling fine-grained, per-tweet misinformation analysis across two high-stakes languages. Using zero-shot cross-lingual transfer with multilingual BERT, the authors demonstrate that cross-lingual stance classification yields macro $F1$ around $43$, indicating viable cross-language generalization even with limited native stance data. The work details data collection pipelines, claims sources, annotation protocols, and language-specific challenges (code-switching, obscenities, and filters circumvention), while providing a transparent reproducibility framework. The dataset enables cross-cultural misinformation research and highlights practical considerations for ethical data sharing, platform biases, and future improvements with more annotators and broader modeling approaches. Overall, Stanceosaurus 2.0 shows that transformer-based stance classification can be a useful tool for identifying multicultural misinformation and guiding further multilingual misinformation studies.

Abstract

The Stanceosaurus corpus (Zheng et al., 2022) was designed to provide high-quality, annotated, 5-way stance data extracted from Twitter, suitable for analyzing cross-cultural and cross-lingual misinformation. In the Stanceosaurus 2.0 iteration, we extend this framework to encompass Russian and Spanish. The former is of current significance due to prevalent misinformation amid escalating tensions with the West and the violent incursion into Ukraine. The latter, meanwhile, represents an enormous community that has been largely overlooked on major social media platforms. By incorporating an additional 3,874 Spanish and Russian tweets over 41 misinformation claims, our objective is to support research focused on these issues. To demonstrate the value of this data, we employed zero-shot cross-lingual transfer on multilingual BERT, yielding results on par with the initial Stanceosaurus study with a macro F1 score of 43 for both languages. This underlines the viability of stance classification as an effective tool for identifying multicultural misinformation.

Stanceosaurus 2.0: Classifying Stance Towards Russian and Spanish Misinformation

TL;DR

around

, indicating viable cross-language generalization even with limited native stance data. The work details data collection pipelines, claims sources, annotation protocols, and language-specific challenges (code-switching, obscenities, and filters circumvention), while providing a transparent reproducibility framework. The dataset enables cross-cultural misinformation research and highlights practical considerations for ethical data sharing, platform biases, and future improvements with more annotators and broader modeling approaches. Overall, Stanceosaurus 2.0 shows that transformer-based stance classification can be a useful tool for identifying multicultural misinformation and guiding further multilingual misinformation studies.

Abstract

Paper Structure (39 sections, 5 figures, 4 tables)

This paper contains 39 sections, 5 figures, 4 tables.

Introduction
Russian Misinformation
Spanish Misinformation
Stanceosaurus 2.0: Details
Data Collection
Misinformation Claims
Tweet Collection & Reply Chains
Russian Corpus
Russian Twitter
Code Switching
Obscenities
Spanish Corpus
Circumventing Filters
Social Media Usage
Code Switching
...and 24 more sections

Figures (5)

Figure 1: Example of a data point (tweet and context) in the Russian Stanceosaurus dataset. For the claim "NATO forces are currently fighting in Ukraine", we have an example tweet chain demonstrating various stances.
Figure 2: Label distribution for tweets (by query, not context) in the (a) Russian dataset and (b) Spanish dataset.
Figure 3: Russian Claims and Queries
Figure 4: Part 1 of Spanish Claims and Queries
Figure 5: Part 2 of Spanish Claims and Queries

Stanceosaurus 2.0: Classifying Stance Towards Russian and Spanish Misinformation

TL;DR

Abstract

Stanceosaurus 2.0: Classifying Stance Towards Russian and Spanish Misinformation

Authors

TL;DR

Abstract

Table of Contents

Figures (5)