Table of Contents
Fetching ...

Exploring Genre and Success Classification through Song Lyrics using DistilBERT: A Fun NLP Venture

Servando Pizarro Martinez, Moritz Zimmermann, Miguel Serkan Offermann, Florian Reither

TL;DR

The paper tackles predicting genre, success, and approximate release year from song lyrics using NLP. It employs a DistilBERT-based feature extractor for classification and uses BERT embeddings with SVR for year estimation, evaluated on a Genius lyric dataset. Key findings include genre accuracy of 65%, success accuracy of 79%, and a release-year RMSE of $14.18$ (via SVR), demonstrating that lyric content contains actionable signals for multi-task prediction. The work culminates in an interactive dashboard that demonstrates practical lyric-driven music analysis, offering insights into emotional and thematic aspects of songs without audio data.

Abstract

This paper presents a natural language processing (NLP) approach to the problem of thoroughly comprehending song lyrics, with particular attention on genre classification, view-based success prediction, and approximate release year. Our tests provide promising results with 65\% accuracy in genre classification and 79\% accuracy in success prediction, leveraging a DistilBERT model for genre classification and BERT embeddings for release year prediction. Support Vector Machines outperformed other models in predicting the release year, achieving the lowest root mean squared error (RMSE) of 14.18. Our study offers insights that have the potential to revolutionize our relationship with music by addressing the shortcomings of current approaches in properly understanding the emotional intricacies of song lyrics.

Exploring Genre and Success Classification through Song Lyrics using DistilBERT: A Fun NLP Venture

TL;DR

The paper tackles predicting genre, success, and approximate release year from song lyrics using NLP. It employs a DistilBERT-based feature extractor for classification and uses BERT embeddings with SVR for year estimation, evaluated on a Genius lyric dataset. Key findings include genre accuracy of 65%, success accuracy of 79%, and a release-year RMSE of (via SVR), demonstrating that lyric content contains actionable signals for multi-task prediction. The work culminates in an interactive dashboard that demonstrates practical lyric-driven music analysis, offering insights into emotional and thematic aspects of songs without audio data.

Abstract

This paper presents a natural language processing (NLP) approach to the problem of thoroughly comprehending song lyrics, with particular attention on genre classification, view-based success prediction, and approximate release year. Our tests provide promising results with 65\% accuracy in genre classification and 79\% accuracy in success prediction, leveraging a DistilBERT model for genre classification and BERT embeddings for release year prediction. Support Vector Machines outperformed other models in predicting the release year, achieving the lowest root mean squared error (RMSE) of 14.18. Our study offers insights that have the potential to revolutionize our relationship with music by addressing the shortcomings of current approaches in properly understanding the emotional intricacies of song lyrics.
Paper Structure (16 sections, 4 equations, 7 figures, 2 tables)

This paper contains 16 sections, 4 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: The proposed feature extraction model
  • Figure 2: The DistilBERT model architecture and components
  • Figure 3: Distribution of Sentiment Scores by Genre
  • Figure 4: Sentiment Trends over Years
  • Figure 5: Classification Report for Predicting Genre
  • ...and 2 more figures