Table of Contents
Fetching ...

Uncertainty-Aware Estimation of Mis/Disinformation Prevalence on Social Media

Ishari Amarasinghe, Salvatore Romano, Jacopo Amidei, Emmanuel M. Vincent, Andreas Kaltenbrunner

Abstract

Estimation of mis/disinformation prevalence in social media is crucial for designing mitigation strategies to limit its impact. Yet, such estimations are subject to several uncertainties that are rarely quantified jointly. In this study, we present a methodological contribution in which confidence intervals were used to quantify uncertainties related to mis/disinformation prevalence. The analysis draws on a multi-platform, multilingual dataset annotated by professional fact-checkers. Data were collected between March and April 2025 from Facebook, Instagram, LinkedIn, TikTok, X/Twitter, and YouTube across four EU Member States (France, Poland, Slovakia, and Spain). We account for different causes of uncertainty: (i) sample uncertainty, (ii) annotation uncertainty arising from human disagreement and misclassification, and (iii) data retrieval uncertainty induced by keyword-based data collection. First, we estimate the uncertainty arising from the different causes separately using confidence intervals, simulation-based methods, and bootstrapping. Finally, we combined multinomial simulations of annotator behaviour with keyword and post-resampling to capture the joint impact of measurement uncertainty on mis/disinformation prevalence estimates. The proposed methodological approach highlights the importance of uncertainty-aware estimation of mis/disinformation prevalence for robust analysis. The empirical results of this study show that keyword-based data retrieval can exceed baseline variability, leading to wider confidence intervals around prevalence estimates.

Uncertainty-Aware Estimation of Mis/Disinformation Prevalence on Social Media

Abstract

Estimation of mis/disinformation prevalence in social media is crucial for designing mitigation strategies to limit its impact. Yet, such estimations are subject to several uncertainties that are rarely quantified jointly. In this study, we present a methodological contribution in which confidence intervals were used to quantify uncertainties related to mis/disinformation prevalence. The analysis draws on a multi-platform, multilingual dataset annotated by professional fact-checkers. Data were collected between March and April 2025 from Facebook, Instagram, LinkedIn, TikTok, X/Twitter, and YouTube across four EU Member States (France, Poland, Slovakia, and Spain). We account for different causes of uncertainty: (i) sample uncertainty, (ii) annotation uncertainty arising from human disagreement and misclassification, and (iii) data retrieval uncertainty induced by keyword-based data collection. First, we estimate the uncertainty arising from the different causes separately using confidence intervals, simulation-based methods, and bootstrapping. Finally, we combined multinomial simulations of annotator behaviour with keyword and post-resampling to capture the joint impact of measurement uncertainty on mis/disinformation prevalence estimates. The proposed methodological approach highlights the importance of uncertainty-aware estimation of mis/disinformation prevalence for robust analysis. The empirical results of this study show that keyword-based data retrieval can exceed baseline variability, leading to wider confidence intervals around prevalence estimates.
Paper Structure (32 sections, 3 equations, 10 figures, 8 tables, 3 algorithms)

This paper contains 32 sections, 3 equations, 10 figures, 8 tables, 3 algorithms.

Figures (10)

  • Figure 1: Overview of the Data collection and annotation process
  • Figure 2: Distribution of Labels per Language in the pre-processed corpus
  • Figure 3: Distribution of Labels per Platform in the pre-processed corpus
  • Figure 4: Junior-to-post-review label transitions in the double-coded subset, shown separately for each language, after grouping the original annotation scheme into three disjoint groups. Rows correspond to Junior first-round group assignments, and columns correspond to second-round grouped labels after Senior review.
  • Figure 5: Mean 3×3 correction matrices under multinomial annotation uncertainty (Junior-only remainder). Each cell reports the mean expected number of posts reassigned across 500 multinomial simulations, with 2.5–97.5 percentile intervals shown in brackets.
  • ...and 5 more figures