Table of Contents
Fetching ...

Engagement, User Satisfaction, and the Amplification of Divisive Content on Social Media

Smitha Milli, Micah Carroll, Yike Wang, Sashrika Pandey, Sebastian Zhao, Anca D. Dragan

TL;DR

The paper investigates whether Twitter's engagement-based ranking amplifies emotionally charged, partisan, and out-group content, using a pre-registered algorithmic audit with 806 participants comparing the engagement-based timeline to a reverse-chronological baseline and to a stated-preference (SP) timeline derived from participant surveys. It employs reader judgments and GPT-4 labeling to assess tweet-level emotions, partisanship, and out-group animus, finding that the engagement-based timeline increases anger, polarization, and negative attitudes toward out-groups (e.g., $0.24$ SD increases in partisanship and out-group animus; reader-out-group affect worsens by $-0.17$ SD; author anger rises by $0.47$ SD). The SP timeline reduces negativity and animus relative to engagement, but may intensify in-group bias; a SP-OA variant that down-ranks out-group hostility further lowers out-group animus to about $17$ extpercent of political tweets, suggesting a feasible path to align content with stated preferences while mitigating divisive content. The findings highlight a need for nuanced ranking strategies that balance engagement with users' stated preferences to curb polarization and misperception on social platforms.

Abstract

In a pre-registered algorithmic audit, we found that, relative to a reverse-chronological baseline, Twitter's engagement-based ranking algorithm amplifies emotionally charged, out-group hostile content that users say makes them feel worse about their political out-group. Furthermore, we find that users do \emph{not} prefer the political tweets selected by the algorithm, suggesting that the engagement-based algorithm underperforms in satisfying users' stated preferences. Finally, we explore the implications of an alternative approach that ranks content based on users' stated preferences and find a reduction in angry, partisan, and out-group hostile content, but also a potential reinforcement of pro-attitudinal content. The evidence underscores the necessity for a more nuanced approach to content ranking that balances engagement and users' stated preferences.

Engagement, User Satisfaction, and the Amplification of Divisive Content on Social Media

TL;DR

The paper investigates whether Twitter's engagement-based ranking amplifies emotionally charged, partisan, and out-group content, using a pre-registered algorithmic audit with 806 participants comparing the engagement-based timeline to a reverse-chronological baseline and to a stated-preference (SP) timeline derived from participant surveys. It employs reader judgments and GPT-4 labeling to assess tweet-level emotions, partisanship, and out-group animus, finding that the engagement-based timeline increases anger, polarization, and negative attitudes toward out-groups (e.g., SD increases in partisanship and out-group animus; reader-out-group affect worsens by SD; author anger rises by SD). The SP timeline reduces negativity and animus relative to engagement, but may intensify in-group bias; a SP-OA variant that down-ranks out-group hostility further lowers out-group animus to about extpercent of political tweets, suggesting a feasible path to align content with stated preferences while mitigating divisive content. The findings highlight a need for nuanced ranking strategies that balance engagement with users' stated preferences to curb polarization and misperception on social platforms.

Abstract

In a pre-registered algorithmic audit, we found that, relative to a reverse-chronological baseline, Twitter's engagement-based ranking algorithm amplifies emotionally charged, out-group hostile content that users say makes them feel worse about their political out-group. Furthermore, we find that users do \emph{not} prefer the political tweets selected by the algorithm, suggesting that the engagement-based algorithm underperforms in satisfying users' stated preferences. Finally, we explore the implications of an alternative approach that ranks content based on users' stated preferences and find a reduction in angry, partisan, and out-group hostile content, but also a potential reinforcement of pro-attitudinal content. The evidence underscores the necessity for a more nuanced approach to content ranking that balances engagement and users' stated preferences.
Paper Structure (38 sections, 4 equations, 52 figures, 38 tables)

This paper contains 38 sections, 4 equations, 52 figures, 38 tables.

Figures (52)

  • Figure 1: Average treatment effects for all outcomes. ATEs are shown with 95% Bootstrap confidence intervals (unadjusted for multiple testing). The treatment effects of two different timelines are shown, relative to the reverse-chronological timeline: (1) Twitter's own engagement-based timeline, (2) our exploratory timeline that ranks based on users' stated preferences. The outcomes shown here are based on reader judgments, the analogous effects for GPT-4 based labels can be found in \ref{['appendix:gpt-effects']}. The effect sizes for both timelines are relative to the reverse-chronological timeline (the zero line). Average treatment effects are standardized using the standard deviation of outcomes in the reverse-chronological timeline (see \ref{['app:ate-estimation']} for details).
  • Figure 2: The distribution of political tweets and out-group animosity. The graph on the left shows the distribution of political tweets in each timeline, categorized by whether they align with the reader's in-group, out-group, or are moderate. Meanwhile, the graph on the right delineates the proportion of political tweets that express out-group animosity, broken down by whether they target the reader's in-group or out-group. The stated preference timeline has a lower percentage of tweets with animosity than the engagement timeline. However, this decrease is mainly due to a decrease in animosity towards the reader's in-group; the percentage of tweets with animosity towards the reader's out-group stays roughly the same.
  • Figure 3: Sample view of the survey: users saw tweets embedded alongside each question for reference.
  • Figure 4: The graphs compare the histogram of tweets in the engagement-based and chronological timelines along four properties: the author's number of followers, the tweet's number of likes, the tweet's number of retweets, and the tweet's age. All graphs are plotted on a log-scale $x$-axis because all properties have a long-tail of extreme outliers. The dashed lines show the median value for each timeline.
  • Figure 5: The distribution of political tweets in user timelines. The number of political tweets in each timeline is calculated using the participant's response to the binary question, “Is [@author-handle]’s tweet about a political or social issue?". Notably, about 30% of participants' timelines contain no political tweets.
  • ...and 47 more figures