Table of Contents
Fetching ...

Identifying Coordinated Activities on Online Social Networks Using Contrast Pattern Mining

Isura Manchanayaka, Zainab Zaidi, Shanika Karunasekera, Christopher Leckie

TL;DR

The paper tackles the problem of detecting coordinated activity on online social networks by viewing coordination as abnormal growth in behavioural patterns over time. It introduces a framework built on contrast pattern mining using EPClose to extract closed contrast patterns from two time windows: a background window $D_b$ and a target window $D_t$, and quantifies pattern growth via $gr(X,D_t,D_b)=\frac{supp(X,D_t)}{supp(X,D_b)}$ and $supp_\delta(X,D_t,D_b)$. The approach is evaluated on real-world data from Russia's IRA influencing the 2016 USA elections, augmented with noisy background data, achieving up to $F1$ scores of $86\%$ and surpassing several baselines by more than $10\%$ in performance, while remaining memory-efficient. Contributions include (1) formalizing contrast-pattern mining for coordination detection, (2) proposing a practical framework with real-data application, and (3) conducting extensive parameter and ablation analyses to show the importance of temporal attributes. The work highlights the potential of using growth in behavioural patterns to identify coordinating users and points to future work on incorporating richer attributes (sentiment, topics) and automatic time-interval determination for real-time deployment.

Abstract

The proliferation of misinformation and disinformation on social media networks has become increasingly concerning. With a significant portion of the population using social media on a regular basis, there are growing efforts by malicious organizations to manipulate public opinion through coordinated campaigns. Current methods for identifying coordinated user accounts typically rely on either similarities in user behaviour, latent coordination in activity traces, or classification techniques. In our study, we propose a framework based on the hypothesis that coordinated users will demonstrate abnormal growth in their behavioural patterns over time relative to the wider population. Specifically, we utilize the EPClose algorithm to extract contrasting patterns of user behaviour during a time window of malicious activity, which we then compare to a historical time window. We evaluated the effectiveness of our approach using real-world data, and our results show a minimum increase of 10% in the F1 score compared to existing approaches.

Identifying Coordinated Activities on Online Social Networks Using Contrast Pattern Mining

TL;DR

The paper tackles the problem of detecting coordinated activity on online social networks by viewing coordination as abnormal growth in behavioural patterns over time. It introduces a framework built on contrast pattern mining using EPClose to extract closed contrast patterns from two time windows: a background window and a target window , and quantifies pattern growth via and . The approach is evaluated on real-world data from Russia's IRA influencing the 2016 USA elections, augmented with noisy background data, achieving up to scores of and surpassing several baselines by more than in performance, while remaining memory-efficient. Contributions include (1) formalizing contrast-pattern mining for coordination detection, (2) proposing a practical framework with real-data application, and (3) conducting extensive parameter and ablation analyses to show the importance of temporal attributes. The work highlights the potential of using growth in behavioural patterns to identify coordinating users and points to future work on incorporating richer attributes (sentiment, topics) and automatic time-interval determination for real-time deployment.

Abstract

The proliferation of misinformation and disinformation on social media networks has become increasingly concerning. With a significant portion of the population using social media on a regular basis, there are growing efforts by malicious organizations to manipulate public opinion through coordinated campaigns. Current methods for identifying coordinated user accounts typically rely on either similarities in user behaviour, latent coordination in activity traces, or classification techniques. In our study, we propose a framework based on the hypothesis that coordinated users will demonstrate abnormal growth in their behavioural patterns over time relative to the wider population. Specifically, we utilize the EPClose algorithm to extract contrasting patterns of user behaviour during a time window of malicious activity, which we then compare to a historical time window. We evaluated the effectiveness of our approach using real-world data, and our results show a minimum increase of 10% in the F1 score compared to existing approaches.
Paper Structure (22 sections, 5 figures, 4 tables)

This paper contains 22 sections, 5 figures, 4 tables.

Figures (5)

  • Figure 1: Overview of the proposed framework to identify suspected coordinated user accounts based on contrast pattern mining. Boxes with rounded corners represent a process while rectangular boxes represent data. The dark circles denote coordinating users and empty circles denote normal users.
  • Figure 2: Stacked distribution of IRA activities and extracted noise tweets across time. The bin size for the x-axis is 1 million seconds ($\mathtt{\sim}$11.6 days). The red vertical line shows the election date.
  • Figure 3: The variation of precision, recall, F1 score and the number of contrast patterns that are associated with users ($\left|\mathcal{P}_{\{user\}}\right|$) with the variation of $\sigma$ (left) and $\rho$ (right). Results with the maximum F1 score are circled in the 3rd row. Target time period: 2016/07 - 2016/11. Background time period: 2015/01 - 2015/05.
  • Figure 4: The distribution of purity values for each identified behavioural pattern. The size of each marker is proportional to the number of users associated with each behavioural pattern.
  • Figure 5: The variation of F1 scores and the number of contrast patterns that are associated with users ($\left|\mathcal{P}_{\{user\}}\right|$) when the highest impacting attribute is removed (left) or added (right) to the attribute set. $n_C=n_N=400$.