Table of Contents
Fetching ...

Do Recommender Systems Promote Local Music? A Reproducibility Study Using Music Streaming Data

Kristina Matrosova, Lilian Marey, Guillaume Salha-Galvan, Thomas Louail, Olivier Bodini, Manuel Moussallam

TL;DR

This paper tackles whether recommender systems promote local music by reproducing and extending a prior LFM-2b study with a proprietary Deezer dataset. It formalizes local-bias measurement, compares two approaches—NeuMF and ItemKNN—across global and country-specific training, and shows that bias directions and magnitudes are highly sensitive to dataset, $K$, training variant, and labeling sources. The findings reveal substantial cross-dataset differences in local music exposure and demonstrate that labeling biases and incomplete labels can distort measurements of algorithmic biases. By releasing the Deezer dataset and code, the work emphasizes robust, cross-dataset validation and transparent labeling as essential to reliable assessments of local-music representation in recommender systems.

Abstract

This paper examines the influence of recommender systems on local music representation, discussing prior findings from an empirical study on the LFM-2b public dataset. This prior study argued that different recommender systems exhibit algorithmic biases shifting music consumption either towards or against local content. However, LFM-2b users do not reflect the diverse audience of music streaming services. To assess the robustness of this study's conclusions, we conduct a comparative analysis using proprietary listening data from a global music streaming service, which we publicly release alongside this paper. We observe significant differences in local music consumption patterns between our dataset and LFM-2b, suggesting that caution should be exercised when drawing conclusions on local music based solely on LFM-2b. Moreover, we show that the algorithmic biases exhibited in the original work vary in our dataset, and that several unexplored model parameters can significantly influence these biases and affect the study's conclusion on both datasets. Finally, we discuss the complexity of accurately labeling local music, emphasizing the risk of misleading conclusions due to unreliable, biased, or incomplete labels. To encourage further research and ensure reproducibility, we have publicly shared our dataset and code.

Do Recommender Systems Promote Local Music? A Reproducibility Study Using Music Streaming Data

TL;DR

This paper tackles whether recommender systems promote local music by reproducing and extending a prior LFM-2b study with a proprietary Deezer dataset. It formalizes local-bias measurement, compares two approaches—NeuMF and ItemKNN—across global and country-specific training, and shows that bias directions and magnitudes are highly sensitive to dataset, , training variant, and labeling sources. The findings reveal substantial cross-dataset differences in local music exposure and demonstrate that labeling biases and incomplete labels can distort measurements of algorithmic biases. By releasing the Deezer dataset and code, the work emphasizes robust, cross-dataset validation and transparent labeling as essential to reliable assessments of local-music representation in recommender systems.

Abstract

This paper examines the influence of recommender systems on local music representation, discussing prior findings from an empirical study on the LFM-2b public dataset. This prior study argued that different recommender systems exhibit algorithmic biases shifting music consumption either towards or against local content. However, LFM-2b users do not reflect the diverse audience of music streaming services. To assess the robustness of this study's conclusions, we conduct a comparative analysis using proprietary listening data from a global music streaming service, which we publicly release alongside this paper. We observe significant differences in local music consumption patterns between our dataset and LFM-2b, suggesting that caution should be exercised when drawing conclusions on local music based solely on LFM-2b. Moreover, we show that the algorithmic biases exhibited in the original work vary in our dataset, and that several unexplored model parameters can significantly influence these biases and affect the study's conclusion on both datasets. Finally, we discuss the complexity of accurately labeling local music, emphasizing the risk of misleading conclusions due to unreliable, biased, or incomplete labels. To encourage further research and ensure reproducibility, we have publicly shared our dataset and code.
Paper Structure (26 sections, 4 equations, 4 figures, 2 tables)

This paper contains 26 sections, 4 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Proportion of local streams by country, according to the LFM-2b and Deezer datasets. All values are computed using MusicBrainz labels, by considering labeled tracks only.
  • Figure 2: Histograms of the proportion of local streams per user (considering labeled tracks only). Results are split by dataset (i.e., LFM-2b or Deezer), country (i.e., France, Germany, or Brazil), and labeling source (i.e., MusicBrainz labels, Deezer's country of activity, or Deezer's country of origin).
  • Figure 3: Local music algorithmic biases of ItemKNN and NeuMF on LFM-2b users in France, Germany, and Brazil, computed for numbers of recommended tracks $K$ varying from 10 to 100 with a step of 5 tracks. Results are split by training variant ("Global" models are trained using listening data from users of all countries, while "Local" models are trained using only listening data from users of the same country). All values are averaged over 20 model runs and reported with $\pm$ 1 standard deviation intervals. Values above (respectively, under) the "No bias" 0-level horizontal dotted line indicate that the model exhibits a positive (resp., a negative) algorithmic bias towards local music.
  • Figure 4: Local music algorithmic biases of ItemKNN and NeuMF on Deezer users in France, Germany, and Brazil, computed for numbers of recommended tracks $K$ varying from 10 to 100 with a step of 5 tracks. Results are split by training variant ("Global" models are trained against listening data from users of all countries, while "Local" models are trained using only listening data from users of the same country), and by label source (i.e., MusicBrainz labels, Deezer's country of activity, or Deezer's country of origin). All values are averaged over 20 model runs and reported with $\pm$ 1 standard deviation intervals. Values above (resp. under) the "No bias" 0-level dotted line indicate that the model exhibits a positive (resp. negative) algorithmic bias.