TUBERAIDER: Attributing Coordinated Hate Attacks on YouTube Videos to their Source Communities
Mohammad Hammas Saeed, Kostantinos Papadamou, Jeremy Blackburn, Emiliano De Cristofaro, Gianluca Stringhini
TL;DR
This work addresses the problem of attributing coordinated hate raids on YouTube to their source communities, enabling more context-aware moderation. It introduces TubeRaider, a three-stage system that (i) learns per-community language via TF-IDF, (ii) detects peaks in YouTube comment activity after a link is posted, and (iii) attributes the attack to a source community using a multi-class classifier with 60 language-based features. The approach achieves over $75\%$ attribution accuracy and is evaluated with cross-validation and in-the-wild data, including case studies from /pol/, r/The_Donald, and Incels communities. The findings support the viability of language-driven attribution as a component of moderation strategies, while also highlighting limitations such as language overlap and the need for threshold tuning. Overall, TubeRaider demonstrates that combining cross-platform activity signals with community-specific linguistic signals can effectively identify the origin of targeted hate campaigns, informing safer and more nuanced moderation decisions.
Abstract
Alas, coordinated hate attacks, or raids, are becoming increasingly common online. In a nutshell, these are perpetrated by a group of aggressors who organize and coordinate operations on a platform (e.g., 4chan) to target victims on another community (e.g., YouTube). In this paper, we focus on attributing raids to their source community, paving the way for moderation approaches that take the context (and potentially the motivation) of an attack into consideration. We present TUBERAIDER, an attribution system achieving over 75% accuracy in detecting and attributing coordinated hate attacks on YouTube videos. We instantiate it using links to YouTube videos shared on 4chan's /pol/ board, r/The_Donald, and 16 Incels-related subreddits. We use a peak detector to identify a rise in the comment activity of a YouTube video, which signals that an attack may be occurring. We then train a machine learning classifier based on the community language (i.e., TF-IDF scores of relevant keywords) to perform the attribution. We test TUBERAIDER in the wild and present a few case studies of actual aggression attacks identified by it to showcase its effectiveness.
