Table of Contents
Fetching ...

Measuring the co-evolution of online engagement with (mis)information and its visibility at scale

Yueting Han, Paolo Turrini, Marya Bazzi, Giulia Andrighetto, Eugenia Polizzi, Manlio De Domenico

Abstract

Online attention is an increasingly valuable resource in the digital age, with extraordinary events such as the COVID-19 pandemic fuelling fierce competition around it. As misinformation pervades online platforms, users seek credible sources, while news outlets compete to attract and retain their attention. Here we measure the co-evolution of online ``engagement'' with (mis)information and its ``visibility'', where engagement corresponds to user interactions on social media, and visibility to fluctuations in user follower counts. Using over 100 million COVID-related retweets across 3 years, we analyse how user interactions and follower dynamics differ for factual, misleading and uncertain content. We observe that during major events (e.g., vaccine rollouts), users spreading factual content see rapid follower gain spikes, whereas those sharing misleading content tend to sustain faster growth outside of these high-attention periods. We introduce two scalable modelling frameworks (simple contagion and biased convergence) that reproduce many observed differing follower growth rates using temporal retweet network dynamics, providing evidence that content visibility co-evolves with user engagement. Our modelling lends itself to studying other large-scale events where online attention is at stake, such as climate and political debates.

Measuring the co-evolution of online engagement with (mis)information and its visibility at scale

Abstract

Online attention is an increasingly valuable resource in the digital age, with extraordinary events such as the COVID-19 pandemic fuelling fierce competition around it. As misinformation pervades online platforms, users seek credible sources, while news outlets compete to attract and retain their attention. Here we measure the co-evolution of online ``engagement'' with (mis)information and its ``visibility'', where engagement corresponds to user interactions on social media, and visibility to fluctuations in user follower counts. Using over 100 million COVID-related retweets across 3 years, we analyse how user interactions and follower dynamics differ for factual, misleading and uncertain content. We observe that during major events (e.g., vaccine rollouts), users spreading factual content see rapid follower gain spikes, whereas those sharing misleading content tend to sustain faster growth outside of these high-attention periods. We introduce two scalable modelling frameworks (simple contagion and biased convergence) that reproduce many observed differing follower growth rates using temporal retweet network dynamics, providing evidence that content visibility co-evolves with user engagement. Our modelling lends itself to studying other large-scale events where online attention is at stake, such as climate and political debates.

Paper Structure

This paper contains 15 sections, 9 equations, 10 figures.

Figures (10)

  • Figure 1: Comparison of the original and filtered retweet network. The filtered retweet network (purple) preserves key features of the original one (grey) across the entire timeframe from 17 March 2020 to 12 February 2023. (a) Retweeter-retweetee dynamics. "retweetees only" includes users who are only retweeted, "retweeters only" includes those who only retweet, and "retweeters & retweetees" includes the rest. Each component's percentage indicates the proportion of users it contains, and each edge's percentage represents the proportion of retweets between two components. Most percentages associated with the original network are similar to those in the filtered network. (b) Retweet category distribution. The percentage on each bar shows how much of the retweets from that specific category (i.e., factual, misleading, or uncertain) in the original network are retained in the filtered network. These percentages consistently remain around 25% across all three categories. (c) Retweet temporal distribution. The filtered network preserves the temporal distribution shape in comparison to the original network, capturing both the long-term decline trend and short-term spikes seen in the original network.
  • Figure 2: Thresholding highly aligned users. Using the filtered retweet network, we categorise users as highly aligned with a campaign (factual, misleading, or uncertain) if >95% of the retweets they give or receive are of that content type. (a) User distribution by retweet proportions for different content types. The triangle heatmap depicts the user distribution based on the proportion of retweets for each content type that every individual user gives or receives. Three light-coloured corners (>95%) suggest a large number of users predominantly circulate only one type of content. Darker cells along the y-axis suggest users are much less inclined to circulate factual and misleading content together. (b) Percentage of retweets by highly aligned users at varying thresholds. For each content type, we calculate the percentage of corresponding retweets given or received by highly aligned users at varying thresholds. The figure displays notable jumps around the 95% threshold for all three content types, despite some differences in curve shapes. This indicates that users highly aligned (>95%) are involved in a large proportion of retweets.
  • Figure 3: Heterogeneity at the (a) global (b) local level.
  • Figure 4: Sizes of disparity backbones for different significance levels $\bm{\alpha}$. For each value of $\alpha$, we calculate the fraction of weight, nodes, and edges kept in the backbones compared to the original network.
  • Figure 5: Topological properties of disparity backbones for different significance levels $\bm{\alpha}$. (a) Average clustering coefficient. (b) Edge weight distribution. For $\alpha \geq 0.05$, the filtered network maintains weight distributions that resemble those of the original network. However, for $\alpha = 10^{-5}, 10^{-10}$, this congruence fails for the segment involving small weights. This suggests that filters with extremely small $\alpha$ may excessively remove a large number of edges with small weights, which could still be statistically significant at the local scale. (c), (d) Complementary cumulative degree distribution. The findings remain similar to (b). Interestingly, the in-degree distribution follows a power-law pattern, with the power-law exponent $\beta - 1 = 0.9$, whereas the out-degree exhibits a concave log-log distribution. This is possibly because sources with high out-degree, while serving as statistically significant spreaders of COVID-related content, often distribute various types of news content, which we exclude for our interest in this paper.
  • ...and 5 more figures