Uncertainty-Aware Estimation of Mis/Disinformation Prevalence on Social Media

Ishari Amarasinghe; Salvatore Romano; Jacopo Amidei; Emmanuel M. Vincent; Andreas Kaltenbrunner

Uncertainty-Aware Estimation of Mis/Disinformation Prevalence on Social Media

Ishari Amarasinghe, Salvatore Romano, Jacopo Amidei, Emmanuel M. Vincent, Andreas Kaltenbrunner

Abstract

Estimation of mis/disinformation prevalence in social media is crucial for designing mitigation strategies to limit its impact. Yet, such estimations are subject to several uncertainties that are rarely quantified jointly. In this study, we present a methodological contribution in which confidence intervals were used to quantify uncertainties related to mis/disinformation prevalence. The analysis draws on a multi-platform, multilingual dataset annotated by professional fact-checkers. Data were collected between March and April 2025 from Facebook, Instagram, LinkedIn, TikTok, X/Twitter, and YouTube across four EU Member States (France, Poland, Slovakia, and Spain). We account for different causes of uncertainty: (i) sample uncertainty, (ii) annotation uncertainty arising from human disagreement and misclassification, and (iii) data retrieval uncertainty induced by keyword-based data collection. First, we estimate the uncertainty arising from the different causes separately using confidence intervals, simulation-based methods, and bootstrapping. Finally, we combined multinomial simulations of annotator behaviour with keyword and post-resampling to capture the joint impact of measurement uncertainty on mis/disinformation prevalence estimates. The proposed methodological approach highlights the importance of uncertainty-aware estimation of mis/disinformation prevalence for robust analysis. The empirical results of this study show that keyword-based data retrieval can exceed baseline variability, leading to wider confidence intervals around prevalence estimates.

Uncertainty-Aware Estimation of Mis/Disinformation Prevalence on Social Media

Abstract

Paper Structure (32 sections, 3 equations, 10 figures, 8 tables, 3 algorithms)

This paper contains 32 sections, 3 equations, 10 figures, 8 tables, 3 algorithms.

Introduction
Background and Related Work
Definitions and EU regulatory context
Language as an implementation variable
Platform constraints, cross-platform comparability, and prevalence estimation
Methods
Data Collection
Data Annotation and Pre-processing
Analysis Units and Aggregation Levels
Baseline Prevalence Estimates
Uncertainty Estimates
Annotation-related uncertainty
Modelling Annotation Disagreement
Multinomial Simulation of Annotation Error
Data retrieval uncertainty
...and 17 more sections

Figures (10)

Figure 1: Overview of the Data collection and annotation process
Figure 2: Distribution of Labels per Language in the pre-processed corpus
Figure 3: Distribution of Labels per Platform in the pre-processed corpus
Figure 4: Junior-to-post-review label transitions in the double-coded subset, shown separately for each language, after grouping the original annotation scheme into three disjoint groups. Rows correspond to Junior first-round group assignments, and columns correspond to second-round grouped labels after Senior review.
Figure 5: Mean 3×3 correction matrices under multinomial annotation uncertainty (Junior-only remainder). Each cell reports the mean expected number of posts reassigned across 500 multinomial simulations, with 2.5–97.5 percentile intervals shown in brackets.
...and 5 more figures

Uncertainty-Aware Estimation of Mis/Disinformation Prevalence on Social Media

Abstract

Uncertainty-Aware Estimation of Mis/Disinformation Prevalence on Social Media

Authors

Abstract

Table of Contents

Figures (10)