Table of Contents
Fetching ...

Uncovering the Internet's Hidden Values: An Empirical Study of Desirable Behavior Using Highly-Upvoted Content on Reddit

Agam Goyal, Charlotte Lambert, Yoshee Jain, Eshwar Chandrasekharan

TL;DR

This study tackles how to identify desirable online behavior beyond traditional prosocial metrics by treating upvotes as community-approved signals. It deploys a GPT-4o–driven framework to extract macro, meso, and micro values from 16,000 highly-upvoted Reddit comments across 80 subreddits in 2016 and 2022, yielding 64 and 72 values respectively. The authors demonstrate that existing prosociality measures explain only a minority of these values, underscoring the need for nuanced, community-specific models of desirability. Their framework aligns with prior taxonomies at broader scales while revealing new meso and micro values, offering a scalable tool for moderators and researchers to quantify and harness desirable content at scale. The work points to cross-platform extensions and encourages the development of recall-rich, value-grounded moderation tools that reflect real-world community norms.

Abstract

A major task for moderators of online spaces is norm-setting, essentially creating shared norms for user behavior in their communities. Platform design principles emphasize the importance of highlighting norm-adhering examples and explicitly stating community norms. However, norms and values vary between communities and go beyond content-level attributes, making it challenging for platforms and researchers to provide automated ways to identify desirable behavior to be highlighted. Current automated approaches to detect desirability are limited to measures of prosocial behavior, but we do not know whether these measures fully capture the spectrum of what communities value. In this paper, we use upvotes, which express community approval, as a proxy for desirability and examine 16,000 highly-upvoted comments across 80 popular sub-communities on Reddit. Using a large language model, we extract values from these comments across two years (2016 and 2022) and compile 64 and 72 $\textit{macro}$, $\textit{meso}$, and $\textit{micro}$ values for 2016 and 2022 respectively, based on their frequency across communities. Furthermore, we find that existing computational models for measuring prosociality were inadequate to capture on average $82\%$ of the values we extracted. Finally, we show that our approach can not only extract most of the qualitatively-identified values from prior taxonomies, but also uncover new values that are actually encouraged in practice. Our findings highlight the need for nuanced models of desirability that go beyond preexisting prosocial measures. This work has implications for improving moderator understanding of their community values and provides a framework that can supplement qualitative approaches with larger-scale content analyses.

Uncovering the Internet's Hidden Values: An Empirical Study of Desirable Behavior Using Highly-Upvoted Content on Reddit

TL;DR

This study tackles how to identify desirable online behavior beyond traditional prosocial metrics by treating upvotes as community-approved signals. It deploys a GPT-4o–driven framework to extract macro, meso, and micro values from 16,000 highly-upvoted Reddit comments across 80 subreddits in 2016 and 2022, yielding 64 and 72 values respectively. The authors demonstrate that existing prosociality measures explain only a minority of these values, underscoring the need for nuanced, community-specific models of desirability. Their framework aligns with prior taxonomies at broader scales while revealing new meso and micro values, offering a scalable tool for moderators and researchers to quantify and harness desirable content at scale. The work points to cross-platform extensions and encourages the development of recall-rich, value-grounded moderation tools that reflect real-world community norms.

Abstract

A major task for moderators of online spaces is norm-setting, essentially creating shared norms for user behavior in their communities. Platform design principles emphasize the importance of highlighting norm-adhering examples and explicitly stating community norms. However, norms and values vary between communities and go beyond content-level attributes, making it challenging for platforms and researchers to provide automated ways to identify desirable behavior to be highlighted. Current automated approaches to detect desirability are limited to measures of prosocial behavior, but we do not know whether these measures fully capture the spectrum of what communities value. In this paper, we use upvotes, which express community approval, as a proxy for desirability and examine 16,000 highly-upvoted comments across 80 popular sub-communities on Reddit. Using a large language model, we extract values from these comments across two years (2016 and 2022) and compile 64 and 72 , , and values for 2016 and 2022 respectively, based on their frequency across communities. Furthermore, we find that existing computational models for measuring prosociality were inadequate to capture on average of the values we extracted. Finally, we show that our approach can not only extract most of the qualitatively-identified values from prior taxonomies, but also uncover new values that are actually encouraged in practice. Our findings highlight the need for nuanced models of desirability that go beyond preexisting prosocial measures. This work has implications for improving moderator understanding of their community values and provides a framework that can supplement qualitative approaches with larger-scale content analyses.

Paper Structure

This paper contains 62 sections, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Plot representing the macro and meso values extracted from 80 subreddits in $\mathcal{D}_{\text{2016}}$ (left) and $\mathcal{D}_{\text{2022}}$ (right). Macro values manifest in at least 60 subreddits while meso manifest in between 20 and 60 subreddits. In 2016 we find 3 macro values and 13 meso values, while in 2022 we find 2 macro values and 20 meso values. Dashed line separates macro and meso scales.
  • Figure 2: Plot depicting odds ratios by logistic regression analysis on subreddits plotted against the number of subscribers of each subreddit for $\mathcal{D}_{\text{2016}}$ and $\mathcal{D}_{\text{2022}}$. An odds ratio greater than 1 indicates a positive relationship of prosociality measures on likelihood of getting "high" upvotes. We see that for the majority of the subreddits in both years (76.3% in $\mathcal{D}_{\text{2016}}$ and 82% in $\mathcal{D}_{\text{2022}}$), prosociality alone does not increase the likelihood being highly upvoted. Only subreddits with statistically significant results are plotted.
  • Figure 3: Plot depicting quantiles of the score and two thresholds marked in red at $0.9$ and $0.95$. The first significant rise in the score occurs at $0.95$ which we therefore use as the threshold for "high" upvote comments. Since there is little to no rise until $0.7$, we use that as our threshold for "low" upvote comments.