Analyzing Toxicity in Deep Conversations: A Reddit Case Study
Vigneshwaran Shankaran, Rajesh Sharma
TL;DR
This study addresses toxicity propagation in deep online conversations by modeling Reddit threads as rooted trees and quantifying toxicity at each node with a RoBERTa-based classifier trained on ToxiGen, cross-validated against Perspective API. The central metric, Toxic Accumulation $TA$, is defined recursively as $TA_{SubTree}(Node_i) = \sum_{j}^{\# Children} TA({Node_i}_j)$ and $TA(Node_i) = \frac{Toxicity(Node_i) + TA_{SubTree}(Node_i)}{\# Children + 1}$, enabling analysis of how toxicity spreads through a conversation. Regression analyses link current toxicity to prior levels via $T = \beta_0 + \sum_i \beta_i T_{Level_i}$, revealing that the immediate predecessor has the strongest influence and that toxicity often decays after depth 3, with notable variations across subreddits (e.g., r/4chan). Across datasets, the study finds a significant positive correlation between node toxicity and its toxic accumulation (mean $r \approx 0.631$; leaves $r \approx 0.9$), indicating that toxic comments are predictive of escalating toxicity in sub-branches. The results, including a comparison between consensual and non-consensual profanity, have important implications for moderation strategies and understanding how norms shape public discourse on platforms that support deep conversations.
Abstract
Online social media has become increasingly popular in recent years due to its ease of access and ability to connect with others. One of social media's main draws is its anonymity, allowing users to share their thoughts and opinions without fear of judgment or retribution. This anonymity has also made social media prone to harmful content, which requires moderation to ensure responsible and productive use. Several methods using artificial intelligence have been employed to detect harmful content. However, conversation and contextual analysis of hate speech are still understudied. Most promising works only analyze a single text at a time rather than the conversation supporting it. In this work, we employ a tree-based approach to understand how users behave concerning toxicity in public conversation settings. To this end, we collect both the posts and the comment sections of the top 100 posts from 8 Reddit communities that allow profanity, totaling over 1 million responses. We find that toxic comments increase the likelihood of subsequent toxic comments being produced in online conversations. Our analysis also shows that immediate context plays a vital role in shaping a response rather than the original post. We also study the effect of consensual profanity and observe overlapping similarities with non-consensual profanity in terms of user behavior and patterns.
