Table of Contents
Fetching ...

Analyzing News Engagement on Facebook: Tracking Ideological Segregation and News Quality in the Facebook URL Dataset

Emma Fraxanet, Andreas Kaltenbrunner, Fabrizio Germano, Vicenç Gómez

TL;DR

This study analyzes four years (2017–2020) of engagement with news URLs on Facebook using the Facebook Privacy-Protected Full URLs Dataset to quantify ideological segregation and news-quality exposure. By integrating domain-level ideology and quality scores with user political affinities and applying weighted averages and change-point detection, the authors identify two major shifts in engagement that align with Facebook News Feed updates, revealing a widening liberal–conservative ideology gap and fluctuations in low-quality news consumption. The work highlights platform-design effects on news diets, demonstrates robust aggregate trends despite privacy-preserving noise, and situates findings within ongoing conversations about algorithmic influence on polarization. While descriptive and limited to highly engaged, top-ranked news domains, the results provide empirical benchmarks for understanding how engagement-driven algorithms may shape information quality and partisan exposure on social media.

Abstract

The Facebook Privacy-Protected Full URLs Dataset was released to enable independent, academic research on the impact of Facebook's platform on society while ensuring user privacy. The dataset has been used in several studies to analyze the relationship between social media engagement and societal issues such as misinformation, polarization, and the quality of consumed news. In this paper, we conduct a comprehensive analysis of the engagement with popular news domains, covering four years from January 2017 to December 2020, with a focus on user engagement metrics related to news URLs in the U.S. By incorporating the ideological alignment and composite score of quality and reliability of news sources, along with users' political preferences, we construct weighted averages of ideology and quality of news consumption for liberal, conservative, and moderate audiences. This allows us to track the evolution of (i) the ideological gap in news consumption between liberals and conservatives and (ii) the average quality of each group's news consumption. We identify two major shifts in trends, each tied to engagement changes. In both, the ideological gap widens and news quality declines. However, engagement rises in the first shift but falls in the second. Finally, we contextualize these trends by linking them to two major Facebook News Feed updates. Our findings provide empirical evidence to better understand user behavior and engagement with news and their leaning and reliability during the period covered by the dataset.

Analyzing News Engagement on Facebook: Tracking Ideological Segregation and News Quality in the Facebook URL Dataset

TL;DR

This study analyzes four years (2017–2020) of engagement with news URLs on Facebook using the Facebook Privacy-Protected Full URLs Dataset to quantify ideological segregation and news-quality exposure. By integrating domain-level ideology and quality scores with user political affinities and applying weighted averages and change-point detection, the authors identify two major shifts in engagement that align with Facebook News Feed updates, revealing a widening liberal–conservative ideology gap and fluctuations in low-quality news consumption. The work highlights platform-design effects on news diets, demonstrates robust aggregate trends despite privacy-preserving noise, and situates findings within ongoing conversations about algorithmic influence on polarization. While descriptive and limited to highly engaged, top-ranked news domains, the results provide empirical benchmarks for understanding how engagement-driven algorithms may shape information quality and partisan exposure on social media.

Abstract

The Facebook Privacy-Protected Full URLs Dataset was released to enable independent, academic research on the impact of Facebook's platform on society while ensuring user privacy. The dataset has been used in several studies to analyze the relationship between social media engagement and societal issues such as misinformation, polarization, and the quality of consumed news. In this paper, we conduct a comprehensive analysis of the engagement with popular news domains, covering four years from January 2017 to December 2020, with a focus on user engagement metrics related to news URLs in the U.S. By incorporating the ideological alignment and composite score of quality and reliability of news sources, along with users' political preferences, we construct weighted averages of ideology and quality of news consumption for liberal, conservative, and moderate audiences. This allows us to track the evolution of (i) the ideological gap in news consumption between liberals and conservatives and (ii) the average quality of each group's news consumption. We identify two major shifts in trends, each tied to engagement changes. In both, the ideological gap widens and news quality declines. However, engagement rises in the first shift but falls in the second. Finally, we contextualize these trends by linking them to two major Facebook News Feed updates. Our findings provide empirical evidence to better understand user behavior and engagement with news and their leaning and reliability during the period covered by the dataset.
Paper Structure (26 sections, 7 equations, 7 figures, 1 table)

This paper contains 26 sections, 7 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: Selected news domains for this study distributed by Ideology and Quality scores. (bin width ideology $= 0.094$, quality $= 0.05$, ticks mark the center of the bin) Positive values of ideology denote a leaning toward conservative ideology, while negative values correspond to liberal leanings. Quality scores are higher when domains have higher quality. The two dimensions correlate negatively ($r = -0.35, p < 0.001$).
  • Figure 2: (A) Volume of URLs per query and analysis of engagement decay per URL (for news domains). In blue, the total number of URLs retrieved per each monthly query. In orange, the ones considering a 3-month limit after each URL's posting date (the ones used for the analysis). In green, the number of new URLs added to the dataset each month. (B) Average (normalized) engagement. Engagement decays fast, and most of the engagement occurs during the first few months, on average. While this figure specifically represents data from 2019, similar patterns are observed across all analyzed years.
  • Figure 3:
  • Figure 4: Engagement timeline.(A) Total monthly counts of views for news and other domains (passive engagement); (B) aggregate of active engagement metrics for news domains (e.g., likes, shares, comments). (C) Two specific counts: shares and comments, also for news domains. Text labels indicate the maximum values of the noise uncertainty confidence intervals (CI), calculated using Eq.1 in Supplementary Information 2A. Shaded regions represent potential change points, identified through piecewise linear regression. Dashed lines mark the dates of two major documented algorithmic changes.
  • Figure 5: Evolution of engagement (clicks) in terms of content ideology and content quality. (A) and (D) show the evolution of the weighted averages of the domain ideology (normalized between 0 and 1) and domain quality, respectively, as in Eq \ref{['eq:mu']}. The averages are done for each user class (i.e. Conservative, Liberal, Centrist, or without a defined PPA) using clicks as weights. (B) and (E) show the weighted standard deviations related to the ideology and quality averages above, as in Eq \ref{['eq:w_avg']}. (C) represents the ideological gap between the conservative and liberal averages and can serve as a proxy for ideological segregation of news consumption, Eq \ref{['eq:gap']}. (F) shows the proportion of clicks directed towards low-quality domains, Eq \ref{['eq:low']} with $T_\text{low}=0.6$. Noise uncertainty intervals are too small to be visually detected in any of the subfigures. For all shown metrics that involve noisy denominators, we find $SNR > 10^5$. Moreover, bootstrapped uncertainty intervals for the weighted averages have very small standard deviations (in the order of 0.001), indicating high statistical precision. Dashed lines indicate relevant algorithmic updates. Shaded areas are calculated as in the previous Figure \ref{['fig:engagement']}.
  • ...and 2 more figures