Table of Contents
Fetching ...

The Monetisation of Toxicity: Analysing YouTube Content Creators and Controversy-Driven Engagement

Thales Bertaglia, Catalina Goanta, Adriana Iamnitchi

TL;DR

This paper investigates how controversy, toxicity, and monetisation interact on YouTube by constructing a large dataset of controversial creators drawn from Reddit discussions (20 channels, 16,349 videos, over $105{,}854{,}713$ comments). It introduces a monetisation-cue taxonomy from video descriptions, trains a Ridge regression toxicity predictor on the ALYT dataset with tf-idf features, and uses an adjusted toxicity score to better capture distributional toxicity across videos, reporting RMSE = $0.175$ and $R^2 = 0.381$. The findings show toxic comments correlate with higher comment-level engagement but reduce like counts and monetisation cues, with notable heterogeneity across Consistent vs Spike channels and some outliers driving extremes. The work provides a publicly available dataset and methods to study controversy-driven engagement and monetisation, with implications for creators, platform policy, and advertisers.

Abstract

YouTube is a major social media platform that plays a significant role in digital culture, with content creators at its core. These creators often engage in controversial behaviour to drive engagement, which can foster toxicity. This paper presents a quantitative analysis of controversial content on YouTube, focusing on the relationship between controversy, toxicity, and monetisation. We introduce a curated dataset comprising 20 controversial YouTube channels extracted from Reddit discussions, including 16,349 videos and more than 105 million comments. We identify and categorise monetisation cues from video descriptions into various models, including affiliate marketing and direct selling, using lists of URLs and keywords. Additionally, we train a machine learning model to measure the toxicity of comments in these videos. Our findings reveal that while toxic comments correlate with higher engagement, they negatively impact monetisation, indicating that controversy-driven interaction does not necessarily lead to financial gain. We also observed significant variation in monetisation strategies, with some creators showing extensive monetisation despite high toxicity levels. Our study introduces a curated dataset, lists of URLs and keywords to categorise monetisation, a machine learning model to measure toxicity, and is a significant step towards understanding the complex relationship between controversy, engagement, and monetisation on YouTube. The lists used for detecting and categorising monetisation cues are available on https://github.com/thalesbertaglia/toxmon.

The Monetisation of Toxicity: Analysing YouTube Content Creators and Controversy-Driven Engagement

TL;DR

This paper investigates how controversy, toxicity, and monetisation interact on YouTube by constructing a large dataset of controversial creators drawn from Reddit discussions (20 channels, 16,349 videos, over comments). It introduces a monetisation-cue taxonomy from video descriptions, trains a Ridge regression toxicity predictor on the ALYT dataset with tf-idf features, and uses an adjusted toxicity score to better capture distributional toxicity across videos, reporting RMSE = and . The findings show toxic comments correlate with higher comment-level engagement but reduce like counts and monetisation cues, with notable heterogeneity across Consistent vs Spike channels and some outliers driving extremes. The work provides a publicly available dataset and methods to study controversy-driven engagement and monetisation, with implications for creators, platform policy, and advertisers.

Abstract

YouTube is a major social media platform that plays a significant role in digital culture, with content creators at its core. These creators often engage in controversial behaviour to drive engagement, which can foster toxicity. This paper presents a quantitative analysis of controversial content on YouTube, focusing on the relationship between controversy, toxicity, and monetisation. We introduce a curated dataset comprising 20 controversial YouTube channels extracted from Reddit discussions, including 16,349 videos and more than 105 million comments. We identify and categorise monetisation cues from video descriptions into various models, including affiliate marketing and direct selling, using lists of URLs and keywords. Additionally, we train a machine learning model to measure the toxicity of comments in these videos. Our findings reveal that while toxic comments correlate with higher engagement, they negatively impact monetisation, indicating that controversy-driven interaction does not necessarily lead to financial gain. We also observed significant variation in monetisation strategies, with some creators showing extensive monetisation despite high toxicity levels. Our study introduces a curated dataset, lists of URLs and keywords to categorise monetisation, a machine learning model to measure toxicity, and is a significant step towards understanding the complex relationship between controversy, engagement, and monetisation on YouTube. The lists used for detecting and categorising monetisation cues are available on https://github.com/thalesbertaglia/toxmon.
Paper Structure (6 sections, 1 equation, 2 figures, 8 tables)

This paper contains 6 sections, 1 equation, 2 figures, 8 tables.

Figures (2)

  • Figure 1: Videos published per year in the dataset.
  • Figure 2: Percentage of videos with monetisation cues by channel category over time.