Table of Contents
Fetching ...

Here Be Livestreams: Trade-offs in Creating Temporal Maps of Reddit

Virginia Partridge, Jasmine Mangat, Rebecca Curran, Ryan McGrady, Ethan Zuckerman

TL;DR

This is the first study to qualitatively analyze how clusterings of subreddits are perceived by social media researchers at a Reddit-wide scale and to identify particularly stable communities during 2021-2022, such as the Reddit Public Access Network, as well as emerging communities, like one focused on NFT trading.

Abstract

We present a method for mapping Reddit communities that accounts for temporal shifts, using quantitative and qualitative analyses of clustering techniques to produce high-quality, stable, and meaningful maps for researchers, journalists and casual Reddit users. Building on previous work using community embeddings, we find that only a month of Reddit comments suffices to create snapshot embeddings that maintain quality while supporting insight into changes in Reddit communities over time. Comparing different clusterings of community embeddings with quantitative measures of quality and temporal stability, we describe properties of the models and what they tell us about the underlying Reddit data. Moreover, qualitative analysis of the resulting clusters illuminate which properties of clusterings are useful for analysis of Reddit communities. Although clusterings of subreddits have been used in many earlier works, we believe this is the first study to qualitatively analyze how these clusterings are perceived by social media researchers at a Reddit-wide scale. Finally, we demonstrate how the temporal snapshots might be used in exploratory study. We are able to identify particularly stable communities during 2021-2022, such as the Reddit Public Access Network, as well as emerging communities, like one focused on NFT trading. This work informed the development of a webtool for exploring Reddit now available to the public at RedditMap.social.

Here Be Livestreams: Trade-offs in Creating Temporal Maps of Reddit

TL;DR

This is the first study to qualitatively analyze how clusterings of subreddits are perceived by social media researchers at a Reddit-wide scale and to identify particularly stable communities during 2021-2022, such as the Reddit Public Access Network, as well as emerging communities, like one focused on NFT trading.

Abstract

We present a method for mapping Reddit communities that accounts for temporal shifts, using quantitative and qualitative analyses of clustering techniques to produce high-quality, stable, and meaningful maps for researchers, journalists and casual Reddit users. Building on previous work using community embeddings, we find that only a month of Reddit comments suffices to create snapshot embeddings that maintain quality while supporting insight into changes in Reddit communities over time. Comparing different clusterings of community embeddings with quantitative measures of quality and temporal stability, we describe properties of the models and what they tell us about the underlying Reddit data. Moreover, qualitative analysis of the resulting clusters illuminate which properties of clusterings are useful for analysis of Reddit communities. Although clusterings of subreddits have been used in many earlier works, we believe this is the first study to qualitatively analyze how these clusterings are perceived by social media researchers at a Reddit-wide scale. Finally, we demonstrate how the temporal snapshots might be used in exploratory study. We are able to identify particularly stable communities during 2021-2022, such as the Reddit Public Access Network, as well as emerging communities, like one focused on NFT trading. This work informed the development of a webtool for exploring Reddit now available to the public at RedditMap.social.
Paper Structure (18 sections, 6 equations, 8 figures, 1 table)

This paper contains 18 sections, 6 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: Total number of comments, unique user contexts from each month's data snapshot, and Precision@5 performance of the best community embedding trained on that snapshot. P@5 performance each month is consistently high, averaging 0.64 (std. dev. 0.012) and never drops below 0.61.
  • Figure 2: Proportion of analogies solved according to Precision@K out of the total solvable in top 10K most commented subreddits during each month. Dark horizontal bars indicate the approximate dates of each sport season, including playoffs and finals.
  • Figure 3: Intrinsic measures of quality of clustering models, where a single model for each type is trained from each month's snapshot embedding for varying numbers of clusters, showing plots of scores averaged over the year. Higher is better for Silhouette and lower is better for Davies-Bouldin.
  • Figure 4: Histogram of average Jaccard Similarity of each subreddit's 20 nearest neighbors under community embedding snapshot models in adjacent months. Vertical dashed lines indicate mean and one standard deviation above and below. The most stable communities, including Reddit Public Access Network, appear on the right.
  • Figure 5: The nearest neighbors of r/opensea become more stable during the data's time frame, starting near a variety of subreddits devoted to art and crypto and ending amongst subreddits devoted to NFT trading.
  • ...and 3 more figures