Unveiling Online Conspiracy Theorists: a Text-Based Approach and Characterization

Alessandra Recordare; Guglielmo Cola; Tiziano Fagni; Maurizio Tesconi

Unveiling Online Conspiracy Theorists: a Text-Based Approach and Characterization

Alessandra Recordare, Guglielmo Cola, Tiziano Fagni, Maurizio Tesconi

TL;DR

This work tackles the problem of identifying conspiracy theorists on social platforms using only writing style, not network metrics. It builds a text-based framework with three feature groups—emotions, idioms, and linguistic attributes—applied to 14,420 users (7,210 per class) after careful preprocessing and aggregation to user-level statistics. A suite of classifiers is evaluated, with LightGBM on all features achieving an F1 around 0.87, while linguistic and idiom features drive most of the signal and emotions contribute less. The study provides a practical approach to profiling conspiracy propagators and highlights specific linguistic markers and idioms that distinguish them, offering insights for disinformation mitigation and cross-platform generalization.

Abstract

In today's digital landscape, the proliferation of conspiracy theories within the disinformation ecosystem of online platforms represents a growing concern. This paper delves into the complexities of this phenomenon. We conducted a comprehensive analysis of two distinct X (formerly known as Twitter) datasets: one comprising users with conspiracy theorizing patterns and another made of users lacking such tendencies and thus serving as a control group. The distinguishing factors between these two groups are explored across three dimensions: emotions, idioms, and linguistic features. Our findings reveal marked differences in the lexicon and language adopted by conspiracy theorists with respect to other users. We developed a machine learning classifier capable of identifying users who propagate conspiracy theories based on a rich set of 871 features. The results demonstrate high accuracy, with an average F1 score of 0.88. Moreover, this paper unveils the most discriminating characteristics that define conspiracy theory propagators.

Unveiling Online Conspiracy Theorists: a Text-Based Approach and Characterization

TL;DR

Abstract

Paper Structure (13 sections, 6 figures, 4 tables)

This paper contains 13 sections, 6 figures, 4 tables.

Introduction
Related work
Dataset description
Method
Features
Classification
Result and Discussion
Conspiracy users classification
Feature importance
Emotions
Idioms of conspiracy theorists
Linguistic features
Conclusions and future work

Figures (6)

Figure 1: SHAP values for the 20 most important features considering all features
Figure 2: F1 score based on the number of features used for classification (ordered by feature importance) considering all features
Figure 3: SHAP values for the 20 most important features in the emotions set
Figure 4: Descriptive statistics relative to the analyzed idioms, for control group users and conspiracy users
Figure 5: SHAP values for the 20 most important features in the linguistic set
...and 1 more figures

Unveiling Online Conspiracy Theorists: a Text-Based Approach and Characterization

TL;DR

Abstract

Unveiling Online Conspiracy Theorists: a Text-Based Approach and Characterization

Authors

TL;DR

Abstract

Table of Contents

Figures (6)