Unmasking Social Bots: How Confident Are We?
James Giroux, Ariyarathne Gangani, Alexander C. Nwala, Cristiano Fanelli
TL;DR
This paper addresses the challenge of detecting social bots while explicitly quantifying uncertainty at the account level. It proposes the first fully Bayesian deep learning framework for bot detection that separates epistemic and aleatoric uncertainty using Multiplicative Normalizing Flows, and tests it with two feature streams: BLOC and Botometer. The approach yields uncertainty-aware predictions and demonstrates that applying a $3\sigma$ uncertainty threshold can improve accuracy metrics while enabling targeted interventions. Empirical results show competitive AUC against deterministic baselines, with uncertainty information offering practical benefits for reliability and decision-making in moderation pipelines. Overall, the work advances end-to-end uncertainty-aware bot detection applicable to real-world social media monitoring and research.
Abstract
Social bots remain a major vector for spreading disinformation on social media and a menace to the public. Despite the progress made in developing multiple sophisticated social bot detection algorithms and tools, bot detection remains a challenging, unsolved problem that is fraught with uncertainty due to the heterogeneity of bot behaviors, training data, and detection algorithms. Detection models often disagree on whether to label the same account as bot or human-controlled. However, they do not provide any measure of uncertainty to indicate how much we should trust their results. We propose to address both bot detection and the quantification of uncertainty at the account level - a novel feature of this research. This dual focus is crucial as it allows us to leverage additional information related to the quantified uncertainty of each prediction, thereby enhancing decision-making and improving the reliability of bot classifications. Specifically, our approach facilitates targeted interventions for bots when predictions are made with high confidence and suggests caution (e.g., gathering more data) when predictions are uncertain.
