Table of Contents
Fetching ...

Identifying and characterizing superspreaders of low-credibility content on Twitter

Matthew R. DeVerna, Rachith Aiyappa, Diogo Pacheco, John Bryden, Filippo Menczer

TL;DR

This work quantitatively confirms that users who consistently disseminate a disproportionate amount of low-credibility content—so-called superspreaders—are at the center of this problem, and introduces simple metrics to predict the top superspreaders several months into the future.

Abstract

The world's digital information ecosystem continues to struggle with the spread of misinformation. Prior work has suggested that users who consistently disseminate a disproportionate amount of low-credibility content -- so-called superspreaders -- are at the center of this problem. We quantitatively confirm this hypothesis and introduce simple metrics to predict the top superspreaders several months into the future. We then conduct a qualitative review to characterize the most prolific superspreaders and analyze their sharing behaviors. Superspreaders include pundits with large followings, low-credibility media outlets, personal accounts affiliated with those media outlets, and a range of influencers. They are primarily political in nature and use more toxic language than the typical user sharing misinformation. We also find concerning evidence that suggests Twitter may be overlooking prominent superspreaders. We hope this work will further public understanding of bad actors and promote steps to mitigate their negative impacts on healthy digital discourse.

Identifying and characterizing superspreaders of low-credibility content on Twitter

TL;DR

This work quantitatively confirms that users who consistently disseminate a disproportionate amount of low-credibility content—so-called superspreaders—are at the center of this problem, and introduces simple metrics to predict the top superspreaders several months into the future.

Abstract

The world's digital information ecosystem continues to struggle with the spread of misinformation. Prior work has suggested that users who consistently disseminate a disproportionate amount of low-credibility content -- so-called superspreaders -- are at the center of this problem. We quantitatively confirm this hypothesis and introduce simple metrics to predict the top superspreaders several months into the future. We then conduct a qualitative review to characterize the most prolific superspreaders and analyze their sharing behaviors. Superspreaders include pundits with large followings, low-credibility media outlets, personal accounts affiliated with those media outlets, and a range of influencers. They are primarily political in nature and use more toxic language than the typical user sharing misinformation. We also find concerning evidence that suggests Twitter may be overlooking prominent superspreaders. We hope this work will further public understanding of bad actors and promote steps to mitigate their negative impacts on healthy digital discourse.
Paper Structure (26 sections, 1 equation, 5 figures, 1 table)

This paper contains 26 sections, 1 equation, 5 figures, 1 table.

Figures (5)

  • Figure 1: Top: The effect of removing accounts that created low-credibility posts during January and February 2020 (observation period) on the proportion of untrustworthy content present during the following eight months (evaluation period). Nodes (accounts) are removed one by one from a retweet network in order of ascending rank, based on the metrics indicated in the legend. The remaining proportion of retweets of low-credibility posts is plotted versus the number of nodes removed. The lowest value for all curves is not zero, reflecting the fact that approximately 13% of the low-credibility retweets in the evaluation network are by accounts who did not create low-credibility posts during the observation period. Bottom: Likelihood that the difference between the performance of $h$-index and Influence happened by random chance. The most prolific superspreaders according to these two metrics remove a similar amount of low-credibility content. To compare them for any given number of removed accounts, we conduct Cramer von Mises two-sample tests with increasingly larger samples and plot each test's $P$-value on the $y$-axis. After removing more than 50 accounts (gray area) the Influence metric performs significantly better ($P < 0.05$). The difference is not significant if fewer accounts are removed.
  • Figure 2: Classification of superspreader accounts. A large portion (55.1%) of accounts are no longer active. For each class annotated with political affiliations, colors indicate the ideological split. The last group aggregates all accounts with political affiliations.
  • Figure 3: Low-credibility content sharing behavior of superspreaders (points) as captured by the boxplot distribution of the ratio $r_m$. Users identified via the $h$-index share a significantly higher ratio of untrustworthy sources than those identified with the Influence metric.
  • Figure 4: Distributions of language toxicity scores for superspreaders vs. all accounts in the low-credibility content ecosystem.
  • Figure 5: Relationship between suspension, verified status, and popularity of top 250 superspreaders. Top: Percentage of suspended superspreader accounts that are verified. Bottom: Percentage of suspended superspreader accounts based on numbers of followers.