Table of Contents
Fetching ...

Mapping the Russian Internet Troll Network on Twitter using a Predictive Model

Sachith Dassanayaka, Ori Swed, Dimitri Volchenkov

TL;DR

The paper tackles the challenge of mapping Russian ITN on Twitter by constructing a predictive model that assigns actors to four authenticity-based categories (Fake News, Organizations, Political Affiliates, Individuals). It employs eight engineered account-level features, selects them via chi-square testing, normalizes data, and uses a Random Forest with class-balancing to achieve $88\%$ cross-validated accuracy, with strong external validation results of $90.7\%$ and $90.5\%$ on two additional datasets. The findings demonstrate a language-agnostic, scalable approach that can map troll actors and their network operations across time, reinforcing its utility for understanding influence operations and informing defense strategies. The work lays groundwork for extending to real-time tracking and cross-platform analyses, potentially improving detection and response to disinformation campaigns.

Abstract

Russian Internet Trolls use fake personas to spread disinformation through multiple social media streams. Given the increased frequency of this threat across social media platforms, understanding those operations is paramount in combating their influence. Using Twitter content identified as part of the Russian influence network, we created a predictive model to map the network operations. We classify accounts type based on their authenticity function for a sub-sample of accounts by introducing logical categories and training a predictive model to identify similar behavior patterns across the network. Our model attains 88% prediction accuracy for the test set. Validation is done by comparing the similarities with the 3 million Russian troll tweets dataset. The result indicates a 90.7% similarity between the two datasets. Furthermore, we compare our model predictions on a Russian tweets dataset, and the results state that there is 90.5% correspondence between the predictions and the actual categories. The prediction and validation results suggest that our predictive model can assist with mapping the actors in such networks.

Mapping the Russian Internet Troll Network on Twitter using a Predictive Model

TL;DR

The paper tackles the challenge of mapping Russian ITN on Twitter by constructing a predictive model that assigns actors to four authenticity-based categories (Fake News, Organizations, Political Affiliates, Individuals). It employs eight engineered account-level features, selects them via chi-square testing, normalizes data, and uses a Random Forest with class-balancing to achieve cross-validated accuracy, with strong external validation results of and on two additional datasets. The findings demonstrate a language-agnostic, scalable approach that can map troll actors and their network operations across time, reinforcing its utility for understanding influence operations and informing defense strategies. The work lays groundwork for extending to real-time tracking and cross-platform analyses, potentially improving detection and response to disinformation campaigns.

Abstract

Russian Internet Trolls use fake personas to spread disinformation through multiple social media streams. Given the increased frequency of this threat across social media platforms, understanding those operations is paramount in combating their influence. Using Twitter content identified as part of the Russian influence network, we created a predictive model to map the network operations. We classify accounts type based on their authenticity function for a sub-sample of accounts by introducing logical categories and training a predictive model to identify similar behavior patterns across the network. Our model attains 88% prediction accuracy for the test set. Validation is done by comparing the similarities with the 3 million Russian troll tweets dataset. The result indicates a 90.7% similarity between the two datasets. Furthermore, we compare our model predictions on a Russian tweets dataset, and the results state that there is 90.5% correspondence between the predictions and the actual categories. The prediction and validation results suggest that our predictive model can assist with mapping the actors in such networks.
Paper Structure (11 sections, 3 equations, 8 figures, 4 tables)

This paper contains 11 sections, 3 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Process of identifying categories for 21% hashed actors in the $1^{st}$ dataset according to the four conceptual categories.
  • Figure 2: Frequency distribution of abstract categories for 2,408 actors in IRA English dataset.
  • Figure 3: Classification accuracy for each conceptual category under different classifiers with the selected eight features.
  • Figure 4: Accuracy score (average f1-score) of the forest changes with the average depth of the trees.
  • Figure 5: Frequency distribution of conceptual categories for 2,832 actors in the 1st dataset, including all hashed accounts.
  • ...and 3 more figures