Table of Contents
Fetching ...

Labeled Datasets for Research on Information Operations

Ozgur Can Seckin, Manita Pote, Alexander Nwala, Lake Yin, Luca Luceri, Alessandro Flammini, Filippo Menczer

TL;DR

New labeled datasets about 26 campaigns, which contain both IO posts verified by a social media platform and over 13M posts by 303k accounts that discussed similar topics in the same time frames are presented to facilitate the study of narratives, network interactions, and engagement strategies employed by coordinated accounts across various campaigns and countries.

Abstract

Social media platforms have become a hub for political activities and discussions, democratizing participation in these endeavors. However, they have also become an incubator for manipulation campaigns, like information operations (IOs). Some social media platforms have released datasets related to such IOs originating from different countries. However, we lack comprehensive control data that can enable the development of IO detection methods. To bridge this gap, we present new labeled datasets about 26 campaigns, which contain both IO posts verified by a social media platform and over 13M posts by 303k accounts that discussed similar topics in the same time frames (control data). The datasets will facilitate the study of narratives, network interactions, and engagement strategies employed by coordinated accounts across various campaigns and countries. By comparing these coordinated accounts against organic ones, researchers can develop and benchmark IO detection algorithms.

Labeled Datasets for Research on Information Operations

TL;DR

New labeled datasets about 26 campaigns, which contain both IO posts verified by a social media platform and over 13M posts by 303k accounts that discussed similar topics in the same time frames are presented to facilitate the study of narratives, network interactions, and engagement strategies employed by coordinated accounts across various campaigns and countries.

Abstract

Social media platforms have become a hub for political activities and discussions, democratizing participation in these endeavors. However, they have also become an incubator for manipulation campaigns, like information operations (IOs). Some social media platforms have released datasets related to such IOs originating from different countries. However, we lack comprehensive control data that can enable the development of IO detection methods. To bridge this gap, we present new labeled datasets about 26 campaigns, which contain both IO posts verified by a social media platform and over 13M posts by 303k accounts that discussed similar topics in the same time frames (control data). The datasets will facilitate the study of narratives, network interactions, and engagement strategies employed by coordinated accounts across various campaigns and countries. By comparing these coordinated accounts against organic ones, researchers can develop and benchmark IO detection algorithms.

Paper Structure

This paper contains 15 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: Data Collection and Curation Pipeline
  • Figure 2: Data Coverage. Left: Percentage of accounts mentioned/replied to/reposted by IO accounts that can be found in the control dataset. Middle: Percentage of IO hashtags that were also used by control accounts. Right: Percentage of days that IO accounts posted and that were covered by control accounts. Dashed lines indicate median values.