Hostile Counterspeech Drives Users From Hate Subreddits
Daniel Hickey, Matheus Schmitz, Daniel M. T. Fessler, Paul E. Smaldino, Kristina Lerman, Goran Murić, Keith Burghardt
TL;DR
This study investigates whether counterspeech affects participation in online hate communities on Reddit and whether certain counterspeech tactics are more effective. It builds a specialized counterspeech detection model using a newly annotated dataset from 25 hate subreddits, augmented with karma and subreddit context, and uses Mahalanobis matching to causally compare newcomers who receive hostile counterspeech, non-hostile counterspeech, or in-group replies. The results show hostile counterspeech substantially reduces newcomer engagement in hate subreddits (ERR ≈ 0.88), while non-hostile counterspeech has little effect on engagement; general Reddit retention remains largely unaffected. The work provides a publicly available dataset and code, highlighting ethical considerations and the need for nuanced counterspeech strategies to mitigate harms while reducing hate online.
Abstract
Counterspeech -- speech that opposes hate speech -- has gained significant attention recently as a strategy to reduce hate on social media. While previous studies suggest that counterspeech can somewhat reduce hate speech, little is known about its effects on participation in online hate communities, nor which counterspeech tactics reduce harmful behavior. We begin to address these gaps by identifying 25 large hate communities ("subreddits") within Reddit and analyzing the effect of counterspeech on newcomers within these communities. We first construct a new public dataset of carefully annotated counterspeech and non-counterspeech comments within these subreddits. We use this dataset to train a state-of-the-art counterspeech detection model. Next, we use matching to evaluate the causal effects of hostile and non-hostile counterspeech on the engagement of newcomers in hate subreddits. We find that, while non-hostile counterspeech is ineffective at keeping users from fully disengaging from these hate subreddits, a single hostile counterspeech comment substantially reduces both future likelihood of engagement. While offering nuance to the understanding of counterspeech efficacy, these results a) leave unanswered the question of whether hostile counterspeech dissuades newcomers from participation in online hate writ large, or merely drives them into less-moderated and more extreme hate communities, and b) raises ethical considerations about hostile counterspeech, which is both comparatively common and might exacerbate rather than mitigate the net level of antagonism in society. These findings underscore the importance of future work to improve counterspeech tactics and minimize unintended harm.
