Table of Contents
Fetching ...

Personalized Audiobook Recommendations at Spotify Through Graph Neural Networks

Marco De Nadai, Francesco Fabbri, Paul Gigioli, Alice Wang, Ang Li, Fabrizio Silvestri, Laura Kim, Shawn Lin, Vladan Radosavljevic, Sandeep Ghael, David Nyhan, Hugues Bouchard, Mounia Lalmas-Roelleke, Andreas Damianou

TL;DR

2T-HGNN, a scalable recommendation system comprising Heterogeneous Graph Neural Networks (HGNNs) and a Two Tower (2T) model, is introduced, which uncovers nuanced item relationships while ensuring low latency and complexity.

Abstract

In the ever-evolving digital audio landscape, Spotify, well-known for its music and talk content, has recently introduced audiobooks to its vast user base. While promising, this move presents significant challenges for personalized recommendations. Unlike music and podcasts, audiobooks, initially available for a fee, cannot be easily skimmed before purchase, posing higher stakes for the relevance of recommendations. Furthermore, introducing a new content type into an existing platform confronts extreme data sparsity, as most users are unfamiliar with this new content type. Lastly, recommending content to millions of users requires the model to react fast and be scalable. To address these challenges, we leverage podcast and music user preferences and introduce 2T-HGNN, a scalable recommendation system comprising Heterogeneous Graph Neural Networks (HGNNs) and a Two Tower (2T) model. This novel approach uncovers nuanced item relationships while ensuring low latency and complexity. We decouple users from the HGNN graph and propose an innovative multi-link neighbor sampler. These choices, together with the 2T component, significantly reduce the complexity of the HGNN model. Empirical evaluations involving millions of users show significant improvement in the quality of personalized recommendations, resulting in a +46% increase in new audiobooks start rate and a +23% boost in streaming rates. Intriguingly, our model's impact extends beyond audiobooks, benefiting established products like podcasts.

Personalized Audiobook Recommendations at Spotify Through Graph Neural Networks

TL;DR

2T-HGNN, a scalable recommendation system comprising Heterogeneous Graph Neural Networks (HGNNs) and a Two Tower (2T) model, is introduced, which uncovers nuanced item relationships while ensuring low latency and complexity.

Abstract

In the ever-evolving digital audio landscape, Spotify, well-known for its music and talk content, has recently introduced audiobooks to its vast user base. While promising, this move presents significant challenges for personalized recommendations. Unlike music and podcasts, audiobooks, initially available for a fee, cannot be easily skimmed before purchase, posing higher stakes for the relevance of recommendations. Furthermore, introducing a new content type into an existing platform confronts extreme data sparsity, as most users are unfamiliar with this new content type. Lastly, recommending content to millions of users requires the model to react fast and be scalable. To address these challenges, we leverage podcast and music user preferences and introduce 2T-HGNN, a scalable recommendation system comprising Heterogeneous Graph Neural Networks (HGNNs) and a Two Tower (2T) model. This novel approach uncovers nuanced item relationships while ensuring low latency and complexity. We decouple users from the HGNN graph and propose an innovative multi-link neighbor sampler. These choices, together with the 2T component, significantly reduce the complexity of the HGNN model. Empirical evaluations involving millions of users show significant improvement in the quality of personalized recommendations, resulting in a +46% increase in new audiobooks start rate and a +23% boost in streaming rates. Intriguingly, our model's impact extends beyond audiobooks, benefiting established products like podcasts.
Paper Structure (25 sections, 7 equations, 4 figures, 5 tables)

This paper contains 25 sections, 7 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: A) our users' consumption patterns, which involve audiobooks and podcasts; B) we build a co-listening graph with nodes representing audiobooks or podcasts, and edges connecting nodes whenever at least one user streams both; C) Audiobook IT gets recommended because 2T-HGNN performs non-trivial recommendations using 2-hop distant patterns. Delicious is similar to Taste. Taste is co-listened with Fake Doctors, which is co-listened with IT.
  • Figure 2: A) The audiobook consumption at launch is very sparse. 25% of users account for 75% of all streaming hours. B) Users having similar audiobook taste are more similar in podcast preferences than users selected at random. C) Audiobooks co-listened by at least one user have similar content embeddings (LLM embeddings extracted from the title and description of the audiobooks). D) Two audiobooks co-listened with the same podcast but not with each other have similar content embeddings.
  • Figure 3: Overview of our model. A) We represent audiobook-podcast relationships using a heterogeneous graph comprising two node types: audiobook and podcast, connected to each other whenever at least one user has listened to both. Each node has LLM embedding features extracted from the titles and descriptions of audiobooks and podcasts. We use a 2-layers HGNN on top of this graph. B) Our 2T model recommends audiobooks to users by exploiting HGNN embeddings, user demographic features (e.g. country and age), and historical user interactions (music, podcasts and audiobooks) represented as embeddings.
  • Figure 4: Co-occurrence Patterns among weak signals. The heatmap illustrates the distribution of signal co-occurrences, with each $(i,j)$ entry representing the fraction of occurrences of signal $j$ in relation to the total occurrences of signal $i$. The adjacent bar plot on the right provides insights into the relative distribution of signals within rows.