Exploring Climate Change Discourse: Measurements and Analysis of Reddit Data
Smriti Janaswamy, Jeremy Blackburn
TL;DR
The paper investigates climate change discourse on Reddit by analyzing 11 climate-related subreddits over 2014–2022. It combines seed-based subreddit expansion via pretrained embeddings and UMAP/HDBSCAN clustering to assemble a broad collection of climate conversations, followed by BERTopic-based topic modeling on post titles to track yearly themes. Named Entity Recognition identifies locations, events, and laws (e.g., Paris Agreement) that are frequently referenced, revealing persistent topics like wildfires, renewables, and climate policy, with Europe often cited. The study demonstrates the viability of large-scale social-media analyses to map evolving climate narratives and highlights persistent concerns, policy threads, and regional framing, while acknowledging data and methodological limitations.
Abstract
Social media is very popular for facilitating conversations about important topics and bringing forth insights and issues related to these topics. Reddit serves as a platform that fosters social interactions and hosts engaging discussions on a wide array of topics, thus forming narratives around these topics. One such topic is climate change. There are extensive discussions on Reddit about climate change, indicating high interest in its various aspects. In this paper, we explore 11 subreddits that discuss climate change for the duration of 2014 to 2022 and conduct a data-driven analysis of the posts on these subreddits. We present a basic characterization of the data and show the distribution of the posts and authors across our dataset for all years. Additionally, we analyze user engagement metrics like scores for the posts and how they change over time. We also offer insights into the topics of discussion across the subreddits, followed by entities referenced throughout the dataset.
