Table of Contents
Fetching ...

The Role of Science in the Climate Change Discussions on Reddit

Paolo Cornale, Michele Tizzani, Fabio Ciulla, Kyriaki Kalimeri, Elisa Omodei, Daniela Paolotti, Yelena Mejova

TL;DR

This study analyzes 14 years of Reddit climate-change discussions to quantify how often scientific sources are cited, comparing them to mass media and social media. It employs a large-scale dataset of posts and comments, six domain categories, unreliable-domain lists, and a Random Forest model with SHAP explanations to relate user behavior to science-link sharing. Key findings show scientific URLs are rare but rising, are concentrated in comments and in center-left audiences, and are rarely triggered by unreliable or strongly partisan posts. The work highlights platform dynamics and has implications for science communication and misinformation mitigation in online public deliberation.

Abstract

Collective and individual action necessary to address climate change hinges on the public's understanding of the relevant scientific findings. In this study, we examine the use of scientific sources in the course of 14 years of public deliberation around climate change on one of the largest social media platforms, Reddit. We find that only 4.0% of the links in the Reddit posts, and 6.5% in the comments, point to domains of scientific sources, although these rates have been increasing in the past decades. These links are dwarfed, however, by the citations of mass media, newspapers, and social media, the latter of which peaked especially during 2019-2020. Further, scientific sources are more likely to be posted by users who also post links to sources having central-left political leaning, and less so by those posting more polarized sources. Unfortunately, scientific sources are not often used in response to links to unreliable sources.

The Role of Science in the Climate Change Discussions on Reddit

TL;DR

This study analyzes 14 years of Reddit climate-change discussions to quantify how often scientific sources are cited, comparing them to mass media and social media. It employs a large-scale dataset of posts and comments, six domain categories, unreliable-domain lists, and a Random Forest model with SHAP explanations to relate user behavior to science-link sharing. Key findings show scientific URLs are rare but rising, are concentrated in comments and in center-left audiences, and are rarely triggered by unreliable or strongly partisan posts. The work highlights platform dynamics and has implications for science communication and misinformation mitigation in online public deliberation.

Abstract

Collective and individual action necessary to address climate change hinges on the public's understanding of the relevant scientific findings. In this study, we examine the use of scientific sources in the course of 14 years of public deliberation around climate change on one of the largest social media platforms, Reddit. We find that only 4.0% of the links in the Reddit posts, and 6.5% in the comments, point to domains of scientific sources, although these rates have been increasing in the past decades. These links are dwarfed, however, by the citations of mass media, newspapers, and social media, the latter of which peaked especially during 2019-2020. Further, scientific sources are more likely to be posted by users who also post links to sources having central-left political leaning, and less so by those posting more polarized sources. Unfortunately, scientific sources are not often used in response to links to unreliable sources.

Paper Structure

This paper contains 19 sections, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Statistics of URL usage in the dataset: (a) proportion of URLs in a particular category, separately for posts (preceded with "P") and comments (preceded with "C" and dashed), (b) engagement with the posts containing a URL of a particular category in terms of the percentage of posts having at least one comment, the average number of comments for posts having at least one comment, and the average length of the comment in terms of words, (c-d) proportion of URLs in a particular category in posts and comments, over time.
  • Figure 2: Distributions of the proportion of URLs posted by a user that are from a particular category, grouped by users whose URLs have a particular political leaning. Under each distribution, the mean proportion is shown, and a * is shown between two consecutive groups if their distributions differ using the Kolmogorov-Smirnov two-sample test at $p<0.001$.
  • Figure 3: SHapley Additive exPlanations (SHAP), a game theoretical approach for explaining the contribution of each feature to the final output of a ML model NIPS2017_7062. The Random Forest model predicts how many (log-normalized) science-related URLs a user has posted in our dataset (here, "high" means more URLs were posted), using behavioral features including the categories of the other URLs they shared, as well as the political leaning of those URLs. Top 100 most popular subreddits are used as features, and all others are summed in "other subreddits".
  • Figure 4: Conditional probability of a first-level comment with a URL containing a certain URL category (columns), given it is in response to a post having a scientific, unreliable, right- or left-leaning URL (rows).
  • Figure 5: Case study of select subreddits, (a) the percentage of URLs having a particular domain category and (b) the percentage of URLs having a particular political leaning. Statistics are shown separately for posts (P) and comments (C).
  • ...and 2 more figures