Fake News Detection via Wisdom of Synthetic & Representative Crowds
François t'Serstevens, Roberto Cerina, Giulia Piccillo
TL;DR
The paper tackles the democratic legitimacy gap in fake-news detection by combining wisdom of crowds with hierarchical Bayesian modeling and post-stratification to produce population-representative veracity scores and state-level share-risk estimates. It contrasts naive crowd estimates with model-based estimates that learn from crowd demographics and tweet context via an ordinal logistic model, then post-stratifies to population personae using MrP, yielding metrics such as model-population and model-balance that converge in interpretation. The results reveal that fake-news sharing is generally rare but shows partisan patterns that depend on how fake news is defined, with Democrats consistently less likely to share under standard metrics and substantial state-level heterogeneity. The approach provides a scalable, transparent framework for democratic fake-news moderation, offering actionable population- and state-level insights and highlighting the value of incorporating uncertainty and representativeness in crowd-based assessments.
Abstract
Social media companies have struggled to provide a democratically legitimate definition of "Fake News". Reliance on expert judgment has attracted criticism due to a general trust deficit and political polarisation. Approaches reliant on the ``wisdom of the crowds'' are a cost-effective, transparent and inclusive alternative. This paper provides a novel end-to-end methodology to detect fake news on X via "wisdom of the synthetic & representative crowds". We deploy an online survey on the Lucid platform to gather veracity assessments for a number of pandemic-related tweets from crowd-workers. Borrowing from the MrP literature, we train a Hierarchical Bayesian model to predict the veracity of each tweet from the perspective of different personae from the population of interest. We then weight the predicted veracity assessments according to a representative stratification frame, such that decisions about ``fake'' tweets are representative of the overall polity of interest. Based on these aggregated scores, we analyse a corpus of tweets and perform a second MrP to generate state-level estimates of the number of people who share fake news. We find small but statistically meaningful heterogeneity in fake news sharing across US states. At the individual-level: i. sharing fake news is generally rare, with an average sharing probability interval [0.07,0.14]; ii. strong evidence that Democrats share less fake news, accounting for a reduction in the sharing odds of [57.3%,3.9%] relative to the average user; iii. when Republican definitions of fake news are used, it is the latter who show a decrease in the propensity to share fake news worth [50.8%, 2.0%]; iv. some evidence that women share less fake news than men, an effect worth a [29.5%,4.9%] decrease.
