Table of Contents
Fetching ...

The Laziness of the Crowd: Effort Aversion Among Raters Risks Undermining the Efficacy of X's Community Notes Program

Morgan Wack, Patrick Warren, Mustafa Alam

Abstract

Crowdsourced moderation systems like Twitter/X's Community Notes program have been proposed as scalable alternatives to professional fact-checkers for combating online misinformation. While prior research has examined the effectiveness of such systems in reducing engagement with false content and their vulnerability to partisan bias, we identify a previously untested mechanism linking fact-check difficulty to systematic non-participation by crowdsourced raters. We hypothesize that claims requiring less cognitive effort to evaluate, specifically, those that are obviously false and easy to refute, are more likely to receive public notes than claims that are more plausible and require greater effort to debunk. Using eighteen months of vaccine-related Community Notes data (2,250 posts) and ratings from 382 survey participants, we show that claims perceived as more difficult to fact-check are significantly less likely to receive notes that achieve ``helpful''/public status. Following the conduct of additional analyses and a fact-checking process utilizing an LLM pipeline to help rule out alternative explanations, we interpret this pattern as consistent with an unwillingness among raters to invest the mental effort required to evaluate and rate notes for more plausible misinformation. These findings suggest that crowdsourced moderation may systematically fail to address the forms of plausible misinformation which are most likely to deceive. We discuss implications for platform design and propose mechanisms to mitigate this difficulty penalty in crowdsourced content moderation systems.

The Laziness of the Crowd: Effort Aversion Among Raters Risks Undermining the Efficacy of X's Community Notes Program

Abstract

Crowdsourced moderation systems like Twitter/X's Community Notes program have been proposed as scalable alternatives to professional fact-checkers for combating online misinformation. While prior research has examined the effectiveness of such systems in reducing engagement with false content and their vulnerability to partisan bias, we identify a previously untested mechanism linking fact-check difficulty to systematic non-participation by crowdsourced raters. We hypothesize that claims requiring less cognitive effort to evaluate, specifically, those that are obviously false and easy to refute, are more likely to receive public notes than claims that are more plausible and require greater effort to debunk. Using eighteen months of vaccine-related Community Notes data (2,250 posts) and ratings from 382 survey participants, we show that claims perceived as more difficult to fact-check are significantly less likely to receive notes that achieve ``helpful''/public status. Following the conduct of additional analyses and a fact-checking process utilizing an LLM pipeline to help rule out alternative explanations, we interpret this pattern as consistent with an unwillingness among raters to invest the mental effort required to evaluate and rate notes for more plausible misinformation. These findings suggest that crowdsourced moderation may systematically fail to address the forms of plausible misinformation which are most likely to deceive. We discuss implications for platform design and propose mechanisms to mitigate this difficulty penalty in crowdsourced content moderation systems.
Paper Structure (38 sections, 10 figures, 17 tables)

This paper contains 38 sections, 10 figures, 17 tables.

Figures (10)

  • Figure 1: Fact-Check Difficulty and Rates of Community Note Application Note: Solid line represents the share of posts receiving at least one helpful Community Note at each level of perceived fact-check difficulty, with shaded region representing 95% confidence intervals. Light bars show the share of sample at each difficulty level. $N$ = 3,512 individual judgments across 2,250 posts.
  • Figure 2: Effect of Fact-Check Difficulty on Community Notes Outcomes Note: OLS coefficients on fact-check difficulty score with robust standard errors clustered at the post level. Controls: log(followers), log(retweets), log(likes). Panel B outcomes measured on the highest-rated note per post. Filled markers indicate $p < .05$; hollow markers indicate $p \geq .05$.
  • Figure 3: Fact-Check Difficulty and Believability Note: Solid lines represent mean judgments of how believable the underlying post is to the judges themselves (self-rated) and how believable they judge it to be to others (perceived audience). Shaded regions represent 95% confidence intervals.
  • Figure 4: Note Application Rate by Plausibility $\times$ Difficulty. Note: Cell area is proportional to sample size ($n$); color indicates note application rate (red = less protection, blue = more protection). The largest cell, which includes high plausibility and hard to fact-check ($n = 962$, 4.6%), receives notes at less than half the rate of low-plausibility, easy-to-check claims (9.4%). This pattern holds across alternative binning strategies (see Appendix Figure \ref{['fig:heatmap_binning']}).
  • Figure 5: Mediation Analysis: Plausibility $\rightarrow$ Difficulty $\rightarrow$ Note Failure. Note: All models control for likes, retweets, and followers on the original post. The $a$ path shows that plausible claims are perceived as harder to fact-check ($b = 0.254$, $p < .001$). The $b$ path shows that difficulty predicts note failure controlling for plausibility (OR = 0.82, $p = .004$). The direct effect of plausibility ($c'$) is attenuated and non-significant when difficulty is included (OR = 0.94, $p = .34$). Sobel test confirms significant mediation ($Z = -2.86$, $p = .004$; 48% mediated).
  • ...and 5 more figures