Table of Contents
Fetching ...

Using Causality to Infer Coordinated Attacks in Social Media

Isura Manchanayaka, Zainab Razia Zaidi, Shanika Karunasekera, Christopher Leckie

TL;DR

This work reframes coordinated manipulation on social media as a causal inference problem and applies Convergent Cross Mapping (CCM) to user activity traces, augmented by topic modeling to improve efficiency. By evaluating on the IRA dataset and a COVID-19 Twitter case study, the authors demonstrate that CCM can identify coordinating pairs with high precision (up to 75.3% F1 in certain settings) and uncover influential community members, offering a causality-based alternative to theme- or network-centric methods. The key contributions include a CCM-based pipeline for detecting coordination, an optimization via Non-negative Matrix Factorization (NMF) for topic clustering to reduce search space, and robust comparisons against baselines like Granger causality. The approach advances practical detection of coordinated campaigns and provides insights into leadership and information diffusion, with implications for platform defense and content moderation. Overall, the paper shows CCM’s promise for uncovering causal structures of coordinated behavior in large-scale social media data, highlighting both performance gains and scalability challenges.

Abstract

The rise of social media has been accompanied by a dark side with the ease of creating fake accounts and disseminating misinformation through coordinated attacks. Existing methods to identify such attacks often rely on thematic similarities or network-based approaches, overlooking the intricate causal relationships that underlie coordinated actions. This work introduces a novel approach for detecting coordinated attacks using Convergent Cross Mapping (CCM), a technique that infers causality from temporal relationships between user activity. We build on the theoretical framework of CCM by incorporating topic modelling as a basis for further optimizing its performance. We apply CCM to real-world data from the infamous IRA attack on US elections, achieving F1 scores up to 75.3% in identifying coordinated accounts. Furthermore, we analyse the output of our model to identify the most influential users in a community. We apply our model to a case study involving COVID-19 anti-vax related discussions on Twitter. Our results demonstrate the effectiveness of our model in uncovering causal structures of coordinated behaviour, offering a promising avenue for mitigating the threat of malicious campaigns on social media platforms.

Using Causality to Infer Coordinated Attacks in Social Media

TL;DR

This work reframes coordinated manipulation on social media as a causal inference problem and applies Convergent Cross Mapping (CCM) to user activity traces, augmented by topic modeling to improve efficiency. By evaluating on the IRA dataset and a COVID-19 Twitter case study, the authors demonstrate that CCM can identify coordinating pairs with high precision (up to 75.3% F1 in certain settings) and uncover influential community members, offering a causality-based alternative to theme- or network-centric methods. The key contributions include a CCM-based pipeline for detecting coordination, an optimization via Non-negative Matrix Factorization (NMF) for topic clustering to reduce search space, and robust comparisons against baselines like Granger causality. The approach advances practical detection of coordinated campaigns and provides insights into leadership and information diffusion, with implications for platform defense and content moderation. Overall, the paper shows CCM’s promise for uncovering causal structures of coordinated behavior in large-scale social media data, highlighting both performance gains and scalability challenges.

Abstract

The rise of social media has been accompanied by a dark side with the ease of creating fake accounts and disseminating misinformation through coordinated attacks. Existing methods to identify such attacks often rely on thematic similarities or network-based approaches, overlooking the intricate causal relationships that underlie coordinated actions. This work introduces a novel approach for detecting coordinated attacks using Convergent Cross Mapping (CCM), a technique that infers causality from temporal relationships between user activity. We build on the theoretical framework of CCM by incorporating topic modelling as a basis for further optimizing its performance. We apply CCM to real-world data from the infamous IRA attack on US elections, achieving F1 scores up to 75.3% in identifying coordinated accounts. Furthermore, we analyse the output of our model to identify the most influential users in a community. We apply our model to a case study involving COVID-19 anti-vax related discussions on Twitter. Our results demonstrate the effectiveness of our model in uncovering causal structures of coordinated behaviour, offering a promising avenue for mitigating the threat of malicious campaigns on social media platforms.
Paper Structure (39 sections, 1 equation, 9 figures, 4 tables)

This paper contains 39 sections, 1 equation, 9 figures, 4 tables.

Figures (9)

  • Figure 1: Motivating example of the use of CCM to model causal behaviour in simulated social media data. We show the variation of the correlation of predictions (vertical axis denoted $\rho$) for prediction about two simulated users $u_1$ and $u_2$, where $u_2$ follows $u_1$ on social media, as the library lengths $L$ (i.e., sample periods) increase. CCM implies causation if the correlation is increasing for increasing library lengths. : predictions for $u_1$ given $u_2$'s shadow manifold i.e., history, : predictions for $u_2$ given $u_1$'s shadow manifold, .: linear regression drawn for $u_1$'s variation of correlation, and .: linear regression drawn for $u_2$'s variation of correlation. (a) $u_2$ posts after $u_1$, who posts at regular intervals. (b) $u_2$ posts after $u_1$, who posts at irregular intervals. (c) $u_2$ posts after $u_1$, who posts at irregular intervals. However, $u_2$ also posts at random times without $u_1$ triggering $u_2$'s behaviour. (d) $u_1$ and $u_2$ behaves randomly.
  • Figure 2: Stacked distribution of IRA activities and extracted noise tweets across time. The bin size for the x-axis is 1 million seconds ($\mathtt{\sim}$11.6 days). The red vertical line shows the election date.
  • Figure 3: Influence graphs for $N_C=200, N_N=200$. Each edge represents an edge identified by CCM. The edge color is simply an average color between the vertices. (a) Pink vertices are known coordinating users. Green vertices are known normal users. (b) Vertex color represents the community identified Blondel_2008. (c) Vertex color represents the topic of discussions of each user identified by CCM -- General, -- Trump vs. Hillary, -- News, -- Democratic Party, -- Emotions.
  • Figure 4: Influence graph for $N_C=200, N_N=200$. Each edge represents an edge identified by CCM after isolating user groups by topics. The edge color is simply an average color between the vertices. (a) Pink vertices are known coordinating users. Green vertices are known normal users. (b) Vertex color represents the topic of discussions of each user. -- News, -- General, -- Democratic Party.
  • Figure 5: Influence graph for $N_C=400, N_N=400$. Each edge represents an edge identified by CCM after isolating user groups by topics. The edge color is simply an average color between the vertices. (a) Pink vertices are known coordinating users. Green vertices are known normal users. (b) Vertex color represents the topic of discussions of each user. -- General, -- Politics, -- News.
  • ...and 4 more figures