Unveiling Online Conspiracy Theorists: a Text-Based Approach and Characterization
Alessandra Recordare, Guglielmo Cola, Tiziano Fagni, Maurizio Tesconi
TL;DR
This work tackles the problem of identifying conspiracy theorists on social platforms using only writing style, not network metrics. It builds a text-based framework with three feature groups—emotions, idioms, and linguistic attributes—applied to 14,420 users (7,210 per class) after careful preprocessing and aggregation to user-level statistics. A suite of classifiers is evaluated, with LightGBM on all features achieving an F1 around 0.87, while linguistic and idiom features drive most of the signal and emotions contribute less. The study provides a practical approach to profiling conspiracy propagators and highlights specific linguistic markers and idioms that distinguish them, offering insights for disinformation mitigation and cross-platform generalization.
Abstract
In today's digital landscape, the proliferation of conspiracy theories within the disinformation ecosystem of online platforms represents a growing concern. This paper delves into the complexities of this phenomenon. We conducted a comprehensive analysis of two distinct X (formerly known as Twitter) datasets: one comprising users with conspiracy theorizing patterns and another made of users lacking such tendencies and thus serving as a control group. The distinguishing factors between these two groups are explored across three dimensions: emotions, idioms, and linguistic features. Our findings reveal marked differences in the lexicon and language adopted by conspiracy theorists with respect to other users. We developed a machine learning classifier capable of identifying users who propagate conspiracy theories based on a rich set of 871 features. The results demonstrate high accuracy, with an average F1 score of 0.88. Moreover, this paper unveils the most discriminating characteristics that define conspiracy theory propagators.
