Predictively Combatting Toxicity in Health-related Online Discussions through Machine Learning
Jorge Paz-Ruza, Amparo Alonso-Betanzos, Bertha Guijarro-Berdiñas, Carlos Eiras-Franco
TL;DR
This work reframes online toxicity mitigation in health contexts as a predictive, dyadic task between users and subcommunities. By combining Detoxify-derived toxicity signals with a Matrix Factorization–based Collaborative Filtering model, it learns latent toxicity characteristics of users and subreddits to forecast future interactions, framing the problem as binary classification. The authors introduce a novel adaptation of the LOLI data-splitting method for binary dyadic tasks and demonstrate that their approach achieves ~0.83 G-Mean on Reddit COVID-related data, outperforming simple baselines. The results suggest that pre-emptively steering users away from potentially toxic subforums could reduce harmful content and moderation costs, with future work aiming to extend the approach to other platforms and incorporate richer textual/temporal cues.
Abstract
In health-related topics, user toxicity in online discussions frequently becomes a source of social conflict or promotion of dangerous, unscientific behaviour; common approaches for battling it include different forms of detection, flagging and/or removal of existing toxic comments, which is often counterproductive for platforms and users alike. In this work, we propose the alternative of combatting user toxicity predictively, anticipating where a user could interact toxically in health-related online discussions. Applying a Collaborative Filtering-based Machine Learning methodology, we predict the toxicity in COVID-related conversations between any user and subcommunity of Reddit, surpassing 80% predictive performance in relevant metrics, and allowing us to prevent the pairing of conflicting users and subcommunities.
