Table of Contents
Fetching ...

Mapping the interaction between science and misinformation in COVID-19 tweets

Lucila G. Alvarez-Zuzek, Juan P. Bascur, Anna Bertani, Riccardo Gallotti, Vincent A. Traag

TL;DR

The interaction between science and misinformation on Twitter (now X) using a database of ~407M COVID-19-related tweets is studied, indicating misinformation is not driven by a lack of exposure to science but instead raise critical questions about open science practices, particularly the role of preprints in amplifying misleading narratives.

Abstract

During the COVID-19 pandemic, scientific knowledge evolved rapidly, accompanied by a surge of misinformation, labelled an infodemic by the WHO. In this context, we study the interaction between science and misinformation on Twitter (now X) using a database of ~407M COVID-19-related tweets. We classify URL reliability with Media Bias/Fact Check and used Altmetric data to identify scientific publications. We find that among ~1.2M users who shared science, 45% also shared unreliable content. Scientific papers circulated by these users were more often preprints, slightly more likely to be retracted, less cited, and published in lower-impact journals. Our findings indicate misinformation is not driven by a lack of exposure to science but instead raise critical questions about open science practices, particularly the role of preprints in amplifying misleading narratives. Our results underscore the importance of proactive scientific engagement on social media in countering misinformation and reinforcing trust in science during global crises.

Mapping the interaction between science and misinformation in COVID-19 tweets

TL;DR

The interaction between science and misinformation on Twitter (now X) using a database of ~407M COVID-19-related tweets is studied, indicating misinformation is not driven by a lack of exposure to science but instead raise critical questions about open science practices, particularly the role of preprints in amplifying misleading narratives.

Abstract

During the COVID-19 pandemic, scientific knowledge evolved rapidly, accompanied by a surge of misinformation, labelled an infodemic by the WHO. In this context, we study the interaction between science and misinformation on Twitter (now X) using a database of ~407M COVID-19-related tweets. We classify URL reliability with Media Bias/Fact Check and used Altmetric data to identify scientific publications. We find that among ~1.2M users who shared science, 45% also shared unreliable content. Scientific papers circulated by these users were more often preprints, slightly more likely to be retracted, less cited, and published in lower-impact journals. Our findings indicate misinformation is not driven by a lack of exposure to science but instead raise critical questions about open science practices, particularly the role of preprints in amplifying misleading narratives. Our results underscore the importance of proactive scientific engagement on social media in countering misinformation and reinforcing trust in science during global crises.

Paper Structure

This paper contains 22 sections, 1 equation, 8 figures, 3 tables.

Figures (8)

  • Figure 1: Network of publications co-shared by reliable and unreliable users. To examine how scientific publications are discussed on Twitter in more detail, we construct the network of publications co-tweeted by the same user. Specifically, each node represents a scientific article, and edges connect articles that have been tweeted by the same user. Each node represents a scientific publication, and an edge is established between two nodes if both articles were tweeted by at least one user. For the resulting network, we are considering the top $N=2,000$ nodes and $E=990,930$ edges. For clarity of the visualisation, only $8,000$ edges are displayed. Node size represents the number of users who mentioned the corresponding article in a post. Node colour indicates the proportion of these users who also shared at least one unreliable article, normalised by the total number of users sharing the article ($i.e.$, the node size). Colours range from violet ($0$) to yellow ($1$), with violet indicating a lower prevalence of users who shared reliable content and yellow indicating a higher prevalence of users who exclusively shared reliable content. For visualization purposes, the colour scale is truncated to the interval [$0.1, 0.4$], such that the extreme colours (violet and yellow) correspond to the two boundaries of the truncated scale: violet denotes cases where at least $10\%$ of users sharing the article also shared unreliable content, and yellow denotes cases where at least $40\%$ of users never shared unreliable content
  • Figure 2: Interplay between scientific and untrustworthy sources as a function of scientists' Twitter activity. The x-axis represents the proportion of posts made by scientific users, calculated as the number of posts shared by scientists normalised by the total number of posts in each country. The y-axis shows the ratio of posts containing scientific sources to those containing untrustworthy sources. Data are aggregated by country over the entire period from 2020 to 2023. Countries are colour-coded by continents: North America (which includes Central America and Panama), South America, Europe, and Oceania. The size of the dots corresponds to the average number of tweets per scientist. Bars inside the figure correspond to the total number of posts (left) and the total number of unique users sharing reliable and untrustworthy content for scientists' and non-scientists' users. The y-axis is on a logarithmic scale. We discuss some noteworthy countries in more detail in the main text.
  • Figure 3: Statistics on scientific and untrustworthy content. The left-middle panel displays the density of sentiment of posts containing scientific and untrustworthy content from scientist users. The sentiment is based on hutto2014vader, which ranges from $-1$ (most extreme negative) to $1$ (most extreme positive), with $0$ as the neutral point. This metric allows us to assign a single unidimensional measure of emotions expressed in the posts based on the emojis and the text. The right panel represents the level of scientific sources shared by individual users as a function of their number of followers. For each user, we calculate the average proportion of scientific content shared. We consider only posts that contain either scientific sources or untrustworthy sources. We group users in $35$ bins based on their follower count, and we compute the average level of scientific sharing by aggregating across the users within each bin.
  • Figure S1: Tweet statistics over time. Our dataset spans from January 22, 2020, to March 18, 2023. Panel (a) shows the total number of tweets over time, including: all tweets in orange, tweets containing untrustworthy in dark pink, unreliable sources in pink (containing also untrustworthy), reliable in light blue (containing also science), and science in dark blue. While (b) provides a focused view of the proportion of tweets, excluding the all-tweets curve for better clarity. Panel (d) is the proportion of exposure only for the extreme cases of reliability, untrustworthy, and science.
  • Figure S2: (a) User-wise correlation between the number of unreliable tweets and the number of DOI tweets. (b) User-wise partial correlation between the residual number of unreliable tweets and the residual number of DOI tweets after regressing on the total number of tweets. (c) User-wise correlation between the fraction of unreliable tweets and the fraction of DOI tweets.
  • ...and 3 more figures