Table of Contents
Fetching ...

On the Interplay between Musical Preferences and Personality through the Lens of Language

Eliran Shem-Tov, Ella Rabinovich

TL;DR

This study interrogates whether musical preferences are encoded in spontaneous language through the Big Five personality traits. It introduces GenBigFive, a large, LLM-generated corpus for trait-specific text data, and trains robust logistic-regression classifiers that predict five personality dimensions from embeddings. Applying these models to a Reddit-based dataset of nearly 5,000 users across five genres reveals significant, interpretable differences in personality profiles among genre fans, and modest but above-chance ability to predict genre from personality alone. The work provides open resources and demonstrates a scalable approach to integrating language, music psychology, and personality analysis with potential applications in personalization and sociolinguistics.

Abstract

Music serves as a powerful reflection of individual identity, often aligning with deeper psychological traits. Prior research has established correlations between musical preferences and personality, while separate studies have demonstrated that personality is detectable through linguistic analysis. Our study bridges these two research domains by investigating whether individuals' musical preferences leave traces in their spontaneous language through the lens of the Big Five personality traits (Openness, Conscientiousness, Extroversion, Agreeableness, and Neuroticism). Using a carefully curated dataset of over 500,000 text samples from nearly 5,000 authors with reliably identified musical preferences, we build advanced models to assess personality characteristics. Our results reveal significant personality differences across fans of five musical genres. We release resources for future research at the intersection of computational linguistics, music psychology and personality analysis.

On the Interplay between Musical Preferences and Personality through the Lens of Language

TL;DR

This study interrogates whether musical preferences are encoded in spontaneous language through the Big Five personality traits. It introduces GenBigFive, a large, LLM-generated corpus for trait-specific text data, and trains robust logistic-regression classifiers that predict five personality dimensions from embeddings. Applying these models to a Reddit-based dataset of nearly 5,000 users across five genres reveals significant, interpretable differences in personality profiles among genre fans, and modest but above-chance ability to predict genre from personality alone. The work provides open resources and demonstrates a scalable approach to integrating language, music psychology, and personality analysis with potential applications in personalization and sociolinguistics.

Abstract

Music serves as a powerful reflection of individual identity, often aligning with deeper psychological traits. Prior research has established correlations between musical preferences and personality, while separate studies have demonstrated that personality is detectable through linguistic analysis. Our study bridges these two research domains by investigating whether individuals' musical preferences leave traces in their spontaneous language through the lens of the Big Five personality traits (Openness, Conscientiousness, Extroversion, Agreeableness, and Neuroticism). Using a carefully curated dataset of over 500,000 text samples from nearly 5,000 authors with reliably identified musical preferences, we build advanced models to assess personality characteristics. Our results reveal significant personality differences across fans of five musical genres. We release resources for future research at the intersection of computational linguistics, music psychology and personality analysis.

Paper Structure

This paper contains 40 sections, 3 figures, 6 tables.

Figures (3)

  • Figure 1: Distribution of the number of users (out of the pool) that posted or commented at least once in one of the top-10 most popular subreddits in the data, by genre. The various topical threads are represented roughly equally by the fans of the five musical genres.
  • Figure 2: Mean personality trait by musical genre fans. All five traits exhibit significant within-community differences, as measured by the ANOVA test.
  • Figure 3: Cohen's $d$ effect size and pairwise Mann-Whitney test. The value in each cell indicates the effect size, its color -- whether the difference is significant at the $p$-val<0.05 level. Positive effect size (left group has a higher trait value than right) is marked with red, while negative effect size -- with blue; both denote significant difference. Uncolored cells represent insignificant difference.