Table of Contents
Fetching ...

Language models for longitudinal analysis of abusive content in Billboard Music Charts

Rohitash Chandra, Yathin Suresh, Divyansh Raj Sinha, Sanchit Jindal

TL;DR

This work tackles the longitudinal analysis of abusive and explicit content in Billboard-chart lyrics using deep learning and pre-trained language models. It assembles a seven-decade lyric corpus with explicit labels and lyrics, applying fine-tuned BERT/RoBERTa as well as Bi-LSTM baselines to perform sentiment and abuse detection, augmented by HateBERT for abuse classification. The study reveals a pronounced rise in explicit content after 1990 and uncovers nuanced, genre-specific emotion dynamics, while presenting a structured framework for longitudinal content analysis with practical policy implications. The approach supports improved content filtering and parental control on music platforms, though it acknowledges label biases and advocates for broader validation and future incorporation of richer contextual cues and larger LLMs.

Abstract

There is no doubt that there has been a drastic increase in abusive and sexually explicit content in music, particularly in Billboard Music Charts. However, there is a lack of studies that validate the trend for effective policy development, as such content has harmful behavioural changes in children and youths. In this study, we utilise deep learning methods to analyse songs (lyrics) from Billboard Charts of the United States in the last seven decades. We provide a longitudinal study using deep learning and language models and review the evolution of content using sentiment analysis and abuse detection, including sexually explicit content. Our results show a significant rise in explicit content in popular music from 1990 onwards. Furthermore, we find an increasing prevalence of songs with lyrics containing profane, sexually explicit, and otherwise inappropriate language. The longitudinal analysis of the ability of language models to capture nuanced patterns in lyrical content, reflecting shifts in societal norms and language use over time.

Language models for longitudinal analysis of abusive content in Billboard Music Charts

TL;DR

This work tackles the longitudinal analysis of abusive and explicit content in Billboard-chart lyrics using deep learning and pre-trained language models. It assembles a seven-decade lyric corpus with explicit labels and lyrics, applying fine-tuned BERT/RoBERTa as well as Bi-LSTM baselines to perform sentiment and abuse detection, augmented by HateBERT for abuse classification. The study reveals a pronounced rise in explicit content after 1990 and uncovers nuanced, genre-specific emotion dynamics, while presenting a structured framework for longitudinal content analysis with practical policy implications. The approach supports improved content filtering and parental control on music platforms, though it acknowledges label biases and advocates for broader validation and future incorporation of richer contextual cues and larger LLMs.

Abstract

There is no doubt that there has been a drastic increase in abusive and sexually explicit content in music, particularly in Billboard Music Charts. However, there is a lack of studies that validate the trend for effective policy development, as such content has harmful behavioural changes in children and youths. In this study, we utilise deep learning methods to analyse songs (lyrics) from Billboard Charts of the United States in the last seven decades. We provide a longitudinal study using deep learning and language models and review the evolution of content using sentiment analysis and abuse detection, including sexually explicit content. Our results show a significant rise in explicit content in popular music from 1990 onwards. Furthermore, we find an increasing prevalence of songs with lyrics containing profane, sexually explicit, and otherwise inappropriate language. The longitudinal analysis of the ability of language models to capture nuanced patterns in lyrical content, reflecting shifts in societal norms and language use over time.

Paper Structure

This paper contains 21 sections, 16 figures, 7 tables.

Figures (16)

  • Figure 1: Percentage of Explicit Songs by Year
  • Figure 2: Framework for review of abusive and inappropriate content in Billboard Charts
  • Figure 3: Distribution of songs by genre in the dataset.
  • Figure 4: Top 10 trigrams over the years: 1990 - 2024
  • Figure 5: Comparison of model accuracy
  • ...and 11 more figures