Table of Contents
Fetching ...

Unraveling the Web of Disinformation: Exploring the Larger Context of State-Sponsored Influence Campaigns on Twitter

Mohammad Hammas Saeed, Shiza Ali, Pujan Paudel, Jeremy Blackburn, Gianluca Stringhini

TL;DR

The paper tackles the problem of state-sponsored disinformation campaigns on Twitter by proposing a campaign-agnostic detection framework that generalizes across unseen campaigns. It identifies universal campaign traits across 19 campaigns, translates them into a four-modal feature set (user attributes, temporal patterns, stylometry, and source information), and trains a Random Forest classifier that achieves up to 98.5% accuracy on balanced data and up to 94% cross-campaign detection accuracy. The authors validate their approach with a large-scale dataset (over $200$ million tweets) and demonstrate real-world applicability by flagging 116 potentially malicious accounts in the wild and presenting case studies aligned with known campaigns. The work highlights resilience to evasion via multi-factor signals and discusses implications for automated safety features on social platforms, while acknowledging limitations such as API access constraints and language biases and outlining avenues for future research.

Abstract

Social media platforms offer unprecedented opportunities for connectivity and exchange of ideas; however, they also serve as fertile grounds for the dissemination of disinformation. Over the years, there has been a rise in state-sponsored campaigns aiming to spread disinformation and sway public opinion on sensitive topics through designated accounts, known as troll accounts. Past works on detecting accounts belonging to state-backed operations focus on a single campaign. While campaign-specific detection techniques are easier to build, there is no work done on developing systems that are campaign-agnostic and offer generalized detection of troll accounts unaffected by the biases of the specific campaign they belong to. In this paper, we identify several strategies adopted across different state actors and present a system that leverages them to detect accounts from previously unseen campaigns. We study 19 state-sponsored disinformation campaigns that took place on Twitter, originating from various countries. The strategies include sending automated messages through popular scheduling services, retweeting and sharing selective content and using fake versions of verified applications for pushing content. By translating these traits into a feature set, we build a machine learning-based classifier that can correctly identify up to 94% of accounts from unseen campaigns. Additionally, we run our system in the wild and find more accounts that could potentially belong to state-backed operations. We also present case studies to highlight the similarity between the accounts found by our system and those identified by Twitter.

Unraveling the Web of Disinformation: Exploring the Larger Context of State-Sponsored Influence Campaigns on Twitter

TL;DR

The paper tackles the problem of state-sponsored disinformation campaigns on Twitter by proposing a campaign-agnostic detection framework that generalizes across unseen campaigns. It identifies universal campaign traits across 19 campaigns, translates them into a four-modal feature set (user attributes, temporal patterns, stylometry, and source information), and trains a Random Forest classifier that achieves up to 98.5% accuracy on balanced data and up to 94% cross-campaign detection accuracy. The authors validate their approach with a large-scale dataset (over million tweets) and demonstrate real-world applicability by flagging 116 potentially malicious accounts in the wild and presenting case studies aligned with known campaigns. The work highlights resilience to evasion via multi-factor signals and discusses implications for automated safety features on social platforms, while acknowledging limitations such as API access constraints and language biases and outlining avenues for future research.

Abstract

Social media platforms offer unprecedented opportunities for connectivity and exchange of ideas; however, they also serve as fertile grounds for the dissemination of disinformation. Over the years, there has been a rise in state-sponsored campaigns aiming to spread disinformation and sway public opinion on sensitive topics through designated accounts, known as troll accounts. Past works on detecting accounts belonging to state-backed operations focus on a single campaign. While campaign-specific detection techniques are easier to build, there is no work done on developing systems that are campaign-agnostic and offer generalized detection of troll accounts unaffected by the biases of the specific campaign they belong to. In this paper, we identify several strategies adopted across different state actors and present a system that leverages them to detect accounts from previously unseen campaigns. We study 19 state-sponsored disinformation campaigns that took place on Twitter, originating from various countries. The strategies include sending automated messages through popular scheduling services, retweeting and sharing selective content and using fake versions of verified applications for pushing content. By translating these traits into a feature set, we build a machine learning-based classifier that can correctly identify up to 94% of accounts from unseen campaigns. Additionally, we run our system in the wild and find more accounts that could potentially belong to state-backed operations. We also present case studies to highlight the similarity between the accounts found by our system and those identified by Twitter.
Paper Structure (13 sections, 7 figures, 7 tables)

This paper contains 13 sections, 7 figures, 7 tables.

Figures (7)

  • Figure 1: The graphs show the percentage of messages through (a) scheduling applications and (b) the percentage of messages that are retweets.
  • Figure 2: Tweet from "Twitter for Android" source.
  • Figure 3: CDF of number of sources used by real users and accounts from all campaigns.
  • Figure 4: The time graphs show the same applications being used around the same time period by different campaigns.
  • Figure 5: Overview of the system: two input streams are fed to the system: (1) a dataset of known state-sponsored malicious accounts and (2) a set of benign accounts. The system extracts the features of both sets of accounts. Next, a detection model is built using the identified features and used to detect unseen accounts in the wild. Finally, the system alerts users to potential malicious accounts in the wild, and those accounts can be moderated accordingly.
  • ...and 2 more figures