Tuning Into Bias: A Computational Study of Gender Bias in Song Lyrics

Danqing Chen; Adithi Satish; Rasul Khanbayov; Carolin M. Schuster; Georg Groh

Tuning Into Bias: A Computational Study of Gender Bias in Song Lyrics

Danqing Chen, Adithi Satish, Rasul Khanbayov, Carolin M. Schuster, Georg Groh

TL;DR

This paper tackles the problem of quantifying gender bias in English song lyrics by combining topic modeling with bias measurement. It uses BERTopic to cluster 537,553 lyrics into topics and SC-WEAT on Word2Vec embeddings trained per genre and per top topic to quantify gender associations within those themes. The key contributions include per-topic and per-genre bias analyses revealing a shift from romantic to sexualized content over decades, and systematic biases where Intelligence and Strength lean male while Appearance and Weakness align with female associations. These findings highlight how thematic content and genre context shape gender stereotypes in lyrics, offering a computational lens for Digital Humanities and sociolinguistic interpretation with potential implications for media studies and cultural analysis.

Abstract

The application of text mining methods is becoming increasingly prevalent, particularly within Humanities and Computational Social Sciences, as well as in a broader range of disciplines. This paper presents an analysis of gender bias in English song lyrics using topic modeling and bias measurement techniques. Leveraging BERTopic, we cluster a dataset of 537,553 English songs into distinct topics and analyze their temporal evolution. Our results reveal a significant thematic shift in song lyrics over time, transitioning from romantic themes to a heightened focus on the sexualization of women. Additionally, we observe a substantial prevalence of profanity and misogynistic content across various topics, with a particularly high concentration in the largest thematic cluster. To further analyse gender bias across topics and genres in a quantitative way, we employ the Single Category Word Embedding Association Test (SC-WEAT) to calculate bias scores for word embeddings trained on the most prominent topics as well as individual genres. The results indicate a consistent male bias in words associated with intelligence and strength, while appearance and weakness words show a female bias. Further analysis highlights variations in these biases across topics, illustrating the interplay between thematic content and gender stereotypes in song lyrics.

Tuning Into Bias: A Computational Study of Gender Bias in Song Lyrics

TL;DR

Abstract

Paper Structure (15 sections, 3 equations, 10 figures, 4 tables)

This paper contains 15 sections, 3 equations, 10 figures, 4 tables.

Introduction
Related Work
Experimental Setup
Data
Topic Modeling with BERTopic
Bias Measurements - SC-WEAT
Results & Discussion
Topic Analysis
SC-WEAT Analysis
Conclusion
Appendix
Data Cleaning
Analysis of genre popularity across decades
Initial BERTopic Model
Topic Label Analysis Using c-TF-IDF score from Bertopic model

Figures (10)

Figure 1: Detailed workflow including data collection, topic modeling, and SC-WEAT.
Figure 2: Distribution of the top topic in each genre, with (n) representing the number of songs associated with that topic. As shown, the top topic in each genre often includes a significant proportion of songs from other genres, indicating genre overlap in topic composition.
Figure 3: Development over time of top 10 topics in each genre and overall; decline from 2010 to 2020 can be explained by the yet still limited data for the 2020s.
Figure 4: c-TF-IDF score for the overall top topic: "nigga_niggas_bitch"
Figure 5: The SC-WEAT effect size of the target sets in each genre. A positive score indicates male bias, whereas a negative score indicates female bias, and n represents the number of word vectors for each genre.
...and 5 more figures

Tuning Into Bias: A Computational Study of Gender Bias in Song Lyrics

TL;DR

Abstract

Tuning Into Bias: A Computational Study of Gender Bias in Song Lyrics

Authors

TL;DR

Abstract

Table of Contents

Figures (10)