Unveiling Global Narratives: A Multilingual Twitter Dataset of News Media on the Russo-Ukrainian Conflict
Sherzod Hakimov, Gullal S. Cheema
TL;DR
The work addresses the need to understand global media narratives of the Russo-Ukrainian conflict through a multilingual, multimodal data resource. It collects about 1.5 million tweets spanning 60 languages from news/media accounts between February 2022 and May 2023, and links each tweet with its image and processed annotations for entities, stances, textual and visual concepts, and sentiment. This dataset fills gaps in prior research that were language-limited, sentiment-focused, or lacked multimedia coverage from news outlets, enabling analyses of who the major actors are, how stances vary by origin, and how text and visuals co-portray the event. The resource promises to support cross-cultural narrative research and downstream tasks in multimedia discourse analysis, with broad implications for understanding media framing of the conflict.
Abstract
The ongoing Russo-Ukrainian conflict has been a subject of intense media coverage worldwide. Understanding the global narrative surrounding this topic is crucial for researchers that aim to gain insights into its multifaceted dimensions. In this paper, we present a novel multimedia dataset that focuses on this topic by collecting and processing tweets posted by news or media companies on social media across the globe. We collected tweets from February 2022 to May 2023 to acquire approximately 1.5 million tweets in 60 different languages along with their images. Each entry in the dataset is accompanied by processed tags, allowing for the identification of entities, stances, textual or visual concepts, and sentiment. The availability of this multimedia dataset serves as a valuable resource for researchers aiming to investigate the global narrative surrounding the ongoing conflict from various aspects such as who are the prominent entities involved, what stances are taken, where do these stances originate from, how are the different textual and visual concepts related to the event portrayed.
