Unmasking Social Bots: How Confident Are We?

James Giroux; Ariyarathne Gangani; Alexander C. Nwala; Cristiano Fanelli

Unmasking Social Bots: How Confident Are We?

James Giroux, Ariyarathne Gangani, Alexander C. Nwala, Cristiano Fanelli

TL;DR

This paper addresses the challenge of detecting social bots while explicitly quantifying uncertainty at the account level. It proposes the first fully Bayesian deep learning framework for bot detection that separates epistemic and aleatoric uncertainty using Multiplicative Normalizing Flows, and tests it with two feature streams: BLOC and Botometer. The approach yields uncertainty-aware predictions and demonstrates that applying a $3\sigma$ uncertainty threshold can improve accuracy metrics while enabling targeted interventions. Empirical results show competitive AUC against deterministic baselines, with uncertainty information offering practical benefits for reliability and decision-making in moderation pipelines. Overall, the work advances end-to-end uncertainty-aware bot detection applicable to real-world social media monitoring and research.

Abstract

Social bots remain a major vector for spreading disinformation on social media and a menace to the public. Despite the progress made in developing multiple sophisticated social bot detection algorithms and tools, bot detection remains a challenging, unsolved problem that is fraught with uncertainty due to the heterogeneity of bot behaviors, training data, and detection algorithms. Detection models often disagree on whether to label the same account as bot or human-controlled. However, they do not provide any measure of uncertainty to indicate how much we should trust their results. We propose to address both bot detection and the quantification of uncertainty at the account level - a novel feature of this research. This dual focus is crucial as it allows us to leverage additional information related to the quantified uncertainty of each prediction, thereby enhancing decision-making and improving the reliability of bot classifications. Specifically, our approach facilitates targeted interventions for bots when predictions are made with high confidence and suggests caution (e.g., gathering more data) when predictions are uncertain.

Unmasking Social Bots: How Confident Are We?

TL;DR

uncertainty threshold can improve accuracy metrics while enabling targeted interventions. Empirical results show competitive AUC against deterministic baselines, with uncertainty information offering practical benefits for reliability and decision-making in moderation pipelines. Overall, the work advances end-to-end uncertainty-aware bot detection applicable to real-world social media monitoring and research.

Abstract

Paper Structure (12 sections, 9 equations, 6 figures, 6 tables)

This paper contains 12 sections, 9 equations, 6 figures, 6 tables.

Introduction
Related Works
Bot Detection
Uncertainty Quantification in Bot Detection
Methods
Feature Extraction
BLOC
Botometer
Bayesian Neural Networks
Datasets and Experimental Setup
Results
Conclusions

Figures (6)

Figure 1: Impact of Bot Crisis and Uncertainty on Predictions (Our Work): As stated by Musk through X (left), the ability of bots to replicate human behavior and bypass security measures has increased dramatically with the advent of AI. Bot accounts are able to more efficiently mask themselves within the human population on social media platforms. This is shown through dimensionally reduced representations (right), in which we show three planes: (i) the true distributions, where we introduce an offset between human and bot points to ease visualization; (ii) the expected probability of an account being a bot as produced by our network, along with (iii) the associated uncertainty across the feature space, represented by the epistemic and aleatoric components added in quadrature. For (ii) and (iii) we use Gaussian process regression rasmussen2003gaussian for visualization purposes. Uncertainty is greater in regions where ambiguity is higher and the two classes overlap.
Figure 2: BLOC Process Summary: (a) BLOC action and content strings for three users, @NASA, @Alice, and @Bob. Using the action alphabet, the sequence of three tweets (a reply, an original tweet, and a retweet) by @NASA can be represented by three letters $p.T.r$ separated by dots (long pauses). Using the content alphabet, it can be represented by these sets of strings $(Emt)(mmt)(mmmmmUt)$ enclosed in parentheses. (b) After generating BLOC strings, they can be tokenized to generate words which are subsequently used to, (c) generate a matrix which serves as input to BNN and DNN.
Figure 3: Analysis Pipeline: Schematic representation of uncertainty aware decision making in bot detection. The Bayesian Neural Network (BNN) structure is characterized by Multiplicative Normalizing Flows (MNF) pmlr-v70-louizos17a, batch normalization, and SELU activation functions klambauer2017self. The output of the network is the probability of a bot account, along with the epistemic and aleatoric uncertainties. These uncertainties can be combined in quadrature.
Figure 4: Overlayed ROC Curves: Receiver Operating Characteristic (ROC) Curves for the Bayesian Neural Network (BNN), Deep Neural Network (DNN) and Random Forest (RF), trained on BLOC features (a) and Botometer features (b). The uncertainty band on the BNN curves is obtained through a bootstrapping method, in which we sample the posterior over the weights to obtain uncertainties on the False Positive Rate (FPR) and True Positive Rate (TPR) at each threshold. Note the DNN and BNN perform consistently within error. RF outperforms the networks due to its increased ability to operate datasets with lower statistics more efficiently.
Figure 5: Uncertainty as a Function of Probability: Epistemic uncertainty, aleatoric uncertainty and the two in quadrature as function of model probability for the models trained on (a) BLOC features (top row) and (b) Botometer features (bottom row). Note the parabolic like shape of the epistemic distribution, with maximum uncertainty around the decision boundary ($< p_{bot} > = 0.5$). For a well calibrated Bayesian model, this is the expected behavior of the epistemic uncertainty. The aleatoric uncertainty is dictated by the available data, and therefore there exists no expectation on its distribution. The two uncertainties in quadrature produce a convolution of the two, epistemic and aleatoric.
...and 1 more figures

Unmasking Social Bots: How Confident Are We?

TL;DR

Abstract

Unmasking Social Bots: How Confident Are We?

Authors

TL;DR

Abstract

Table of Contents

Figures (6)