A dataset of Open Source Intelligence (OSINT) Tweets about the Russo-Ukrainian war
Johannes Niu, Mila Stillman, Philipp Seeberger, Anna Kruspe
TL;DR
This work addresses OSINT-focused discourse on Twitter surrounding the Russo-Ukrainian war by building a targeted dataset through a two-step snowball sampling approach. The authors identify relevant OSINT accounts and collect top-level Tweets from January 2022 to July 2023, resulting in about 1.9 million Tweets from 1,040 users, including substantial media and external links. First analyses cover temporal trends, language distribution, hashtags, and embedded Tweets, while initial experiments apply relevance classification and clustering to reveal topics and assess misinformation potential. The dataset offers a valuable, complementary resource to broader war-related Twitter datasets and supports OSINT research on information diffusion and misinformation, with publicly available data and clear directions for future enhancements and ethical considerations.
Abstract
Open Source Intelligence (OSINT) refers to intelligence efforts based on freely available data. It has become a frequent topic of conversation on social media, where private users or networks can share their findings. Such data is highly valuable in conflicts, both for gaining a new understanding of the situation as well as for tracking the spread of misinformation. In this paper, we present a method for collecting such data as well as a novel OSINT dataset for the Russo-Ukrainian war drawn from Twitter between January 2022 and July 2023. It is based on an initial search of users posting OSINT and a subsequent snowballing approach to detect more. The final dataset contains almost 2 million Tweets posted by 1040 users. We also provide some first analyses and experiments on the data, and make suggestions for its future usage.
