Table of Contents
Fetching ...

Emotion-aware Personalized Music Recommendation with a Heterogeneity-aware Deep Bayesian Network

Erkang Jing, Yezheng Liu, Yidong Chai, Shuo Yu, Longshun Liu, Yuanchun Jiang, Yang Wang

TL;DR

This work tackles emotion-aware music recommendation by modeling four types of heterogeneity in emotions and music mood preferences: across users, within a user, across users’ mood preferences under the same emotion, and within a user’s mood preferences over time. It introduces the Heterogeneity-aware Deep Bayesian Network (HDBN), a generative framework that learns a personalized prior LED, a posterior LED for self-reported emotions, group-specific Bayesian mood predictors, and within-group mood variability via Bayesian neural networks, all optimized with a variational ELBO and Bayesian Personalized Ranking. The authors validate HDBN on EmoMusicLJ and EmoMusicLJ-small, showing superior performance over a wide range of baselines, with extensive ablations, sensitivity analyses, and case studies confirming the importance of each heterogeneity component. The work contributes two open datasets and a reproducible API-friendly model that advances personalized, emotion-aware music recommendations with interpretable latent emotion structures and demonstrable user-preference alignment. It highlights practical implications for streaming platforms seeking nuanced mood-aware personalization while outlining future directions to use richer priors and richer emotion signals beyond self-reports.

Abstract

Music recommender systems play a critical role in music streaming platforms by providing users with music that they are likely to enjoy. Recent studies have shown that user emotions can influence users' preferences for music moods. However, existing emotion-aware music recommender systems (EMRSs) explicitly or implicitly assume that users' actual emotional states expressed through identical emotional words are homogeneous. They also assume that users' music mood preferences are homogeneous under the same emotional state. In this article, we propose four types of heterogeneity that an EMRS should account for: emotion heterogeneity across users, emotion heterogeneity within a user, music mood preference heterogeneity across users, and music mood preference heterogeneity within a user. We further propose a Heterogeneity-aware Deep Bayesian Network (HDBN) to model these assumptions. The HDBN mimics a user's decision process of choosing music with four components: personalized prior user emotion distribution modeling, posterior user emotion distribution modeling, user grouping, and Bayesian neural network-based music mood preference prediction. We constructed two datasets, called EmoMusicLJ and EmoMusicLJ-small, to validate our method. Extensive experiments demonstrate that our method significantly outperforms baseline approaches on metrics of HR, Precision, NDCG, and MRR. Ablation studies and case studies further validate the effectiveness of our HDBN. The source code and datasets are available at https://github.com/jingrk/HDBN.

Emotion-aware Personalized Music Recommendation with a Heterogeneity-aware Deep Bayesian Network

TL;DR

This work tackles emotion-aware music recommendation by modeling four types of heterogeneity in emotions and music mood preferences: across users, within a user, across users’ mood preferences under the same emotion, and within a user’s mood preferences over time. It introduces the Heterogeneity-aware Deep Bayesian Network (HDBN), a generative framework that learns a personalized prior LED, a posterior LED for self-reported emotions, group-specific Bayesian mood predictors, and within-group mood variability via Bayesian neural networks, all optimized with a variational ELBO and Bayesian Personalized Ranking. The authors validate HDBN on EmoMusicLJ and EmoMusicLJ-small, showing superior performance over a wide range of baselines, with extensive ablations, sensitivity analyses, and case studies confirming the importance of each heterogeneity component. The work contributes two open datasets and a reproducible API-friendly model that advances personalized, emotion-aware music recommendations with interpretable latent emotion structures and demonstrable user-preference alignment. It highlights practical implications for streaming platforms seeking nuanced mood-aware personalization while outlining future directions to use richer priors and richer emotion signals beyond self-reports.

Abstract

Music recommender systems play a critical role in music streaming platforms by providing users with music that they are likely to enjoy. Recent studies have shown that user emotions can influence users' preferences for music moods. However, existing emotion-aware music recommender systems (EMRSs) explicitly or implicitly assume that users' actual emotional states expressed through identical emotional words are homogeneous. They also assume that users' music mood preferences are homogeneous under the same emotional state. In this article, we propose four types of heterogeneity that an EMRS should account for: emotion heterogeneity across users, emotion heterogeneity within a user, music mood preference heterogeneity across users, and music mood preference heterogeneity within a user. We further propose a Heterogeneity-aware Deep Bayesian Network (HDBN) to model these assumptions. The HDBN mimics a user's decision process of choosing music with four components: personalized prior user emotion distribution modeling, posterior user emotion distribution modeling, user grouping, and Bayesian neural network-based music mood preference prediction. We constructed two datasets, called EmoMusicLJ and EmoMusicLJ-small, to validate our method. Extensive experiments demonstrate that our method significantly outperforms baseline approaches on metrics of HR, Precision, NDCG, and MRR. Ablation studies and case studies further validate the effectiveness of our HDBN. The source code and datasets are available at https://github.com/jingrk/HDBN.
Paper Structure (37 sections, 42 equations, 20 figures, 7 tables, 2 algorithms)

This paper contains 37 sections, 42 equations, 20 figures, 7 tables, 2 algorithms.

Figures (20)

  • Figure 1: Toy examples of the existing EMRSs assumptions (a, b, c, and d) and the four heterogeneity assumptions proposed in our method (e, f, g, and h).
  • Figure 2: Conceptual framework of HDBN. The contents in the dotted boxes correspond to our approaches to address the four types of heterogeneity. After obtaining the predicted user’s preferred music mood, combined with the user interaction history and the true music mood label, the user’s rating score of the music can be generated.
  • Figure 3: An overview of the HDBN model flow.
  • Figure 4: Graphical representation and notation description of the generative process.
  • Figure 5: The inference networks for $\boldsymbol{\mu}_u$ and $\boldsymbol{\mu}_{u,v}$. The dashed arrows represent the regulations from the prior distribution.
  • ...and 15 more figures