Table of Contents
Fetching ...

Unveiling Online Conspiracy Theorists: a Text-Based Approach and Characterization

Alessandra Recordare, Guglielmo Cola, Tiziano Fagni, Maurizio Tesconi

TL;DR

This work tackles the problem of identifying conspiracy theorists on social platforms using only writing style, not network metrics. It builds a text-based framework with three feature groups—emotions, idioms, and linguistic attributes—applied to 14,420 users (7,210 per class) after careful preprocessing and aggregation to user-level statistics. A suite of classifiers is evaluated, with LightGBM on all features achieving an F1 around 0.87, while linguistic and idiom features drive most of the signal and emotions contribute less. The study provides a practical approach to profiling conspiracy propagators and highlights specific linguistic markers and idioms that distinguish them, offering insights for disinformation mitigation and cross-platform generalization.

Abstract

In today's digital landscape, the proliferation of conspiracy theories within the disinformation ecosystem of online platforms represents a growing concern. This paper delves into the complexities of this phenomenon. We conducted a comprehensive analysis of two distinct X (formerly known as Twitter) datasets: one comprising users with conspiracy theorizing patterns and another made of users lacking such tendencies and thus serving as a control group. The distinguishing factors between these two groups are explored across three dimensions: emotions, idioms, and linguistic features. Our findings reveal marked differences in the lexicon and language adopted by conspiracy theorists with respect to other users. We developed a machine learning classifier capable of identifying users who propagate conspiracy theories based on a rich set of 871 features. The results demonstrate high accuracy, with an average F1 score of 0.88. Moreover, this paper unveils the most discriminating characteristics that define conspiracy theory propagators.

Unveiling Online Conspiracy Theorists: a Text-Based Approach and Characterization

TL;DR

This work tackles the problem of identifying conspiracy theorists on social platforms using only writing style, not network metrics. It builds a text-based framework with three feature groups—emotions, idioms, and linguistic attributes—applied to 14,420 users (7,210 per class) after careful preprocessing and aggregation to user-level statistics. A suite of classifiers is evaluated, with LightGBM on all features achieving an F1 around 0.87, while linguistic and idiom features drive most of the signal and emotions contribute less. The study provides a practical approach to profiling conspiracy propagators and highlights specific linguistic markers and idioms that distinguish them, offering insights for disinformation mitigation and cross-platform generalization.

Abstract

In today's digital landscape, the proliferation of conspiracy theories within the disinformation ecosystem of online platforms represents a growing concern. This paper delves into the complexities of this phenomenon. We conducted a comprehensive analysis of two distinct X (formerly known as Twitter) datasets: one comprising users with conspiracy theorizing patterns and another made of users lacking such tendencies and thus serving as a control group. The distinguishing factors between these two groups are explored across three dimensions: emotions, idioms, and linguistic features. Our findings reveal marked differences in the lexicon and language adopted by conspiracy theorists with respect to other users. We developed a machine learning classifier capable of identifying users who propagate conspiracy theories based on a rich set of 871 features. The results demonstrate high accuracy, with an average F1 score of 0.88. Moreover, this paper unveils the most discriminating characteristics that define conspiracy theory propagators.
Paper Structure (13 sections, 6 figures, 4 tables)

This paper contains 13 sections, 6 figures, 4 tables.

Figures (6)

  • Figure 1: SHAP values for the 20 most important features considering all features
  • Figure 2: F1 score based on the number of features used for classification (ordered by feature importance) considering all features
  • Figure 3: SHAP values for the 20 most important features in the emotions set
  • Figure 4: Descriptive statistics relative to the analyzed idioms, for control group users and conspiracy users
  • Figure 5: SHAP values for the 20 most important features in the linguistic set
  • ...and 1 more figures