Unsupervised detection of coordinated information operations in the wild

D. Hudson Smith; Carl Ehrett; Patrick L. Warren

Unsupervised detection of coordinated information operations in the wild

D. Hudson Smith, Carl Ehrett, Patrick L. Warren

Abstract

This paper introduces and tests an unsupervised method for detecting novel coordinated inauthentic information operations (CIOs) in realistic settings. This method uses Bayesian inference to identify groups of accounts that share similar account-level characteristics and target similar narratives. We solve the inferential problem using amortized variational inference, allowing us to efficiently infer group identities for millions of accounts. We validate this method using a set of five CIOs from three countries discussing four topics on Twitter. Our unsupervised approach increases detection power (area under the precision-recall curve) relative to a naive baseline (by a factor of 76 to 580), relative to the use of simple flags or narratives on their own (by a factor of 1.3 to 4.8), and comes quite close to a supervised benchmark. Our method is robust to observing only a small share of messaging on the topic, having only weak markers of inauthenticity, and to the CIO accounts making up a tiny share of messages and accounts on the topic. Although we evaluate the results on Twitter, the method is general enough to be applied in many social-media settings.

Unsupervised detection of coordinated information operations in the wild

Abstract

Paper Structure (29 sections, 32 equations, 10 figures, 5 tables)

This paper contains 29 sections, 32 equations, 10 figures, 5 tables.

Introduction
The Problem
Information Environment
Coordinated Contributors
Prior Knowledge of Coordinated Behavior
Approach
Materials and Methods
Bayesian Model for Unsupervised Detection
Model Specification
Inference Procedure
Narrative Feature Selection
Results
Topics, Narratives, and Flags
Full Model Results
Simplified Model Results
...and 14 more sections

Figures (10)

Figure S1: Precision-recall curves for unsupervised detection of CIO accounts parameterized by a probabilistic detection threshold above which accounts are considered to belong to the CIO. For high thresholds, the model makes precise predictions but misses many CIO accounts (top left of graphs). For low thresholds, the model catches most CIO accounts, but has a high false-positive rate (bottom right of graphs). The faint dashed lines correspond to individual members of the ensemble of models. The solid lines come from average the predictions for each account across the ensemble. Precision baselines are given in Table \ref{['tab:avg_prec_scores']}.
Figure S2: Flag and top-8 narrative posterior rate coefficients for Xinjiang dataset. Supervised and unsupervised results are shown with unsupervised identities matched to known CIO groups.
Figure S3: Posterior probability of CIO affiliation for a randomly selected CIO account, across a variety of shares of CIO affiliation in the observed accounts
Figure S4: Posterior probability of CIO affiliation for randomly selected CIO and non-CIO accounts, as a function of total number of observed accounts. Left: Using the narrative (#xinjiang). Right: Using the narrative (#unhumanrightscouncil).
Figure S5: Graphical representation for the generative process for flags and narratives.
...and 5 more figures

Unsupervised detection of coordinated information operations in the wild

Abstract

Unsupervised detection of coordinated information operations in the wild

Authors

Abstract

Table of Contents

Figures (10)