Table of Contents
Fetching ...

Content Quality vs. Attention Allocation: An LLM-Based Case Study in Peer-to-peer Mental Health Networks

Teng Ye, Hanson Yan, Xuhuan Huang, Connor Grogan, Walter Yuan, Qiaozhu Mei, Matthew O. Jackson

TL;DR

Millions of responses to mental health-related posts are analyzed, utilizing large language models to assess the multi-dimensional quality of content, including relevance, empathy, and cultural alignment, among other aspects, to reveal a mismatch between content quality and attention allocation.

Abstract

With the rise of social media and peer-to-peer networks, users increasingly rely on crowdsourced responses for information and assistance. However, the mechanisms used to rank and promote responses often prioritize and end up biasing in favor of timeliness over quality, which may result in suboptimal support for help-seekers. We analyze millions of responses to mental health-related posts, utilizing large language models (LLMs) to assess the multi-dimensional quality of content, including relevance, empathy, and cultural alignment, among other aspects. Our findings reveal a mismatch between content quality and attention allocation: earlier responses - despite being relatively lower in quality - receive disproportionately high fractions of upvotes and visibility due to platform ranking algorithms. We demonstrate that the quality of the top-ranked responses could be improved by up to 39 percent, and even the simplest re-ranking strategy could significantly improve the quality of top responses, highlighting the need for more nuanced ranking mechanisms that prioritize both timeliness and content quality, especially emotional engagement in online mental health communities.

Content Quality vs. Attention Allocation: An LLM-Based Case Study in Peer-to-peer Mental Health Networks

TL;DR

Millions of responses to mental health-related posts are analyzed, utilizing large language models to assess the multi-dimensional quality of content, including relevance, empathy, and cultural alignment, among other aspects, to reveal a mismatch between content quality and attention allocation.

Abstract

With the rise of social media and peer-to-peer networks, users increasingly rely on crowdsourced responses for information and assistance. However, the mechanisms used to rank and promote responses often prioritize and end up biasing in favor of timeliness over quality, which may result in suboptimal support for help-seekers. We analyze millions of responses to mental health-related posts, utilizing large language models (LLMs) to assess the multi-dimensional quality of content, including relevance, empathy, and cultural alignment, among other aspects. Our findings reveal a mismatch between content quality and attention allocation: earlier responses - despite being relatively lower in quality - receive disproportionately high fractions of upvotes and visibility due to platform ranking algorithms. We demonstrate that the quality of the top-ranked responses could be improved by up to 39 percent, and even the simplest re-ranking strategy could significantly improve the quality of top responses, highlighting the need for more nuanced ranking mechanisms that prioritize both timeliness and content quality, especially emotional engagement in online mental health communities.

Paper Structure

This paper contains 4 sections, 6 figures, 1 table.

Figures (6)

  • Figure 1: Distributions of the composite quality score and per-aspect scores show distinct patterns. The composite score follows a bell-shaped distribution, peaking between 5 and 6. Most per-aspect distributions are more skewed. There is a noticeable proportion of comments with low personal resonance or low empathy expression. Overall, the topical aspects of comment quality (topical alignment and lexical precision) are higher rated than emotional aspects (empathy expressions, encouragement level, and personal resonance). The topical alignment score is centered around 8, while most comments lack actionable suggestions.
  • Figure 2: Average quality score at each decile of platform ranking vs. quality-based ranking. Quality of comments generally declines with the percentile of their platform ranking under the same post. However, the top 10% of comments under each post, as ranked by the platform, have significantly lower overall quality (i.e., composite score, $p < .001$) and per-aspect quality for topical alignment ($p < .10$), empathy expression ($p < .001$), encouragement level ($p < .001$) personal resonance ($p < .001$), cultural alignment ($p < .001$)) than the next 10%, particularly in non-topical and emotional aspects. Comments ranked as top 10% have significantly higher lexical precision ($p < .001$) than those in the next decile. Actionable suggestion(s) doesn't present a significant difference between comments in the first and second decile ($p = 0.39$). In the hypothetical setting, if the comments under a post are ranked by their composite quality scores, the average quality of the top-ranked comments can be significantly boosted in all seven aspects, particularly in emotional aspects (empathy expression, encouragement level, and personal resonance).
  • Figure 3: Distribution of comment timeliness in relation to the original post to the original post. Comment response time is binned by 30 minute intervals and labeled in hours. 4.35% (7.80%) of comments were posted within the first 30 minutes (1 hour) after the original post. The frequency of comments declines after this initial hour. Additionally, 93.47% of comments were posted within the first 24 hours of the original post. Comments generated by official Reddit bots are excluded in this plot.
  • Figure 4: Average comment quality in relation to comment response time. Comment response time is binned by 30 minute intervals and labeled in hours. Comments posted within the first 30 minutes present significantly lower quality than those responding in the next 30 minutes ($p < .05$ for all eight quality measurements). The average of comment quality remains stable after the first hour, except for personal resonance and empathy expression, where the quality scores show a slight increase over time.
  • Figure 5: The first 30% of comments on a post receive more upvotes, even after accounting for the time effect. The green line represents the average upvote share (the upvote share refers to the number of upvotes a comment received divided by the total number of upvotes received by all comments under the same post) a comment within each decile is expected to receive. The orange line shows the average share of simulated upvotes a comment in each decile receives, assuming upvotes arrive evenly over time and are randomly distributed among available comments. The orange line does not diminish for the latest comments (100%), as upvotes kept coming in after all comments were posted. The observed data shows a more skewed allocation of upvotes towards early comments, compared with simulated upvotes.
  • ...and 1 more figures