Table of Contents
Fetching ...

TUBERAIDER: Attributing Coordinated Hate Attacks on YouTube Videos to their Source Communities

Mohammad Hammas Saeed, Kostantinos Papadamou, Jeremy Blackburn, Emiliano De Cristofaro, Gianluca Stringhini

TL;DR

This work addresses the problem of attributing coordinated hate raids on YouTube to their source communities, enabling more context-aware moderation. It introduces TubeRaider, a three-stage system that (i) learns per-community language via TF-IDF, (ii) detects peaks in YouTube comment activity after a link is posted, and (iii) attributes the attack to a source community using a multi-class classifier with 60 language-based features. The approach achieves over $75\%$ attribution accuracy and is evaluated with cross-validation and in-the-wild data, including case studies from /pol/, r/The_Donald, and Incels communities. The findings support the viability of language-driven attribution as a component of moderation strategies, while also highlighting limitations such as language overlap and the need for threshold tuning. Overall, TubeRaider demonstrates that combining cross-platform activity signals with community-specific linguistic signals can effectively identify the origin of targeted hate campaigns, informing safer and more nuanced moderation decisions.

Abstract

Alas, coordinated hate attacks, or raids, are becoming increasingly common online. In a nutshell, these are perpetrated by a group of aggressors who organize and coordinate operations on a platform (e.g., 4chan) to target victims on another community (e.g., YouTube). In this paper, we focus on attributing raids to their source community, paving the way for moderation approaches that take the context (and potentially the motivation) of an attack into consideration. We present TUBERAIDER, an attribution system achieving over 75% accuracy in detecting and attributing coordinated hate attacks on YouTube videos. We instantiate it using links to YouTube videos shared on 4chan's /pol/ board, r/The_Donald, and 16 Incels-related subreddits. We use a peak detector to identify a rise in the comment activity of a YouTube video, which signals that an attack may be occurring. We then train a machine learning classifier based on the community language (i.e., TF-IDF scores of relevant keywords) to perform the attribution. We test TUBERAIDER in the wild and present a few case studies of actual aggression attacks identified by it to showcase its effectiveness.

TUBERAIDER: Attributing Coordinated Hate Attacks on YouTube Videos to their Source Communities

TL;DR

This work addresses the problem of attributing coordinated hate raids on YouTube to their source communities, enabling more context-aware moderation. It introduces TubeRaider, a three-stage system that (i) learns per-community language via TF-IDF, (ii) detects peaks in YouTube comment activity after a link is posted, and (iii) attributes the attack to a source community using a multi-class classifier with 60 language-based features. The approach achieves over attribution accuracy and is evaluated with cross-validation and in-the-wild data, including case studies from /pol/, r/The_Donald, and Incels communities. The findings support the viability of language-driven attribution as a component of moderation strategies, while also highlighting limitations such as language overlap and the need for threshold tuning. Overall, TubeRaider demonstrates that combining cross-platform activity signals with community-specific linguistic signals can effectively identify the origin of targeted hate campaigns, informing safer and more nuanced moderation decisions.

Abstract

Alas, coordinated hate attacks, or raids, are becoming increasingly common online. In a nutshell, these are perpetrated by a group of aggressors who organize and coordinate operations on a platform (e.g., 4chan) to target victims on another community (e.g., YouTube). In this paper, we focus on attributing raids to their source community, paving the way for moderation approaches that take the context (and potentially the motivation) of an attack into consideration. We present TUBERAIDER, an attribution system achieving over 75% accuracy in detecting and attributing coordinated hate attacks on YouTube videos. We instantiate it using links to YouTube videos shared on 4chan's /pol/ board, r/The_Donald, and 16 Incels-related subreddits. We use a peak detector to identify a rise in the comment activity of a YouTube video, which signals that an attack may be occurring. We then train a machine learning classifier based on the community language (i.e., TF-IDF scores of relevant keywords) to perform the attribution. We test TUBERAIDER in the wild and present a few case studies of actual aggression attacks identified by it to showcase its effectiveness.
Paper Structure (28 sections, 3 figures, 7 tables)

This paper contains 28 sections, 3 figures, 7 tables.

Figures (3)

  • Figure 1: Probability Density Functions (PDFs) of the activity peak in YouTube comments and the source community thread where the YouTube video is linked from. The time is normalized to the thread's lifetime, where t = 0 denotes the time when the video was first mentioned, and t = 1 is the last post in the thread.
  • Figure 2: Overview of TubeRaider: a set of communities are fed to the system, in this case: (1) 4chan's /pol/ board, (2) r/The_Donald subreddit, and (3) 16 Incels subreddits. TubeRaider learns their language through TF-IDF on the top keywords. To attribute potential attacks, it collects all YouTube comments on videos linked from each source community and identifies peaks in the comment activity of these videos as an indication of a potential coordinated attack. Finally, TubeRaider attributes attacks back to a source community using a machine learning classifier based on the TF-IDF scores of top keywords.
  • Figure 3: Attribution accuracy for various "Minimum Comments" thresholds.