Table of Contents
Fetching ...

A Dataset and Baselines for Measuring and Predicting the Music Piece Memorability

Li-Yang Tseng, Tzu-Ling Lin, Hong-Han Shuai, Jen-Wei Huang, Wen-Whei Chang

TL;DR

This work defines music memorability regression (MMR) and introduces the YouTube Music Memorability (YTMM) dataset collected via a novel memory-game procedure to obtain reliable memorability scores. It evaluates both handcrafted, interpretable features and end-to-end mel-spectrogram–based approaches, with a strong emphasis on interpretability using SHAP. The experiments show that explainable handcrafted features (EHC) paired with SVR/MLP provide the best correlations under limited data, while self-supervised transformers benefit from pitch-shift augmentation. The dataset and baselines enable systematic study of which musical attributes drive memorability, with potential applications in recommendation and style transfer, and the work highlights the importance of data efficiency and interpretability in MUSIC memorability research.

Abstract

Nowadays, humans are constantly exposed to music, whether through voluntary streaming services or incidental encounters during commercial breaks. Despite the abundance of music, certain pieces remain more memorable and often gain greater popularity. Inspired by this phenomenon, we focus on measuring and predicting music memorability. To achieve this, we collect a new music piece dataset with reliable memorability labels using a novel interactive experimental procedure. We then train baselines to predict and analyze music memorability, leveraging both interpretable features and audio mel-spectrograms as inputs. To the best of our knowledge, we are the first to explore music memorability using data-driven deep learning-based methods. Through a series of experiments and ablation studies, we demonstrate that while there is room for improvement, predicting music memorability with limited data is possible. Certain intrinsic elements, such as higher valence, arousal, and faster tempo, contribute to memorable music. As prediction techniques continue to evolve, real-life applications like music recommendation systems and music style transfer will undoubtedly benefit from this new area of research.

A Dataset and Baselines for Measuring and Predicting the Music Piece Memorability

TL;DR

This work defines music memorability regression (MMR) and introduces the YouTube Music Memorability (YTMM) dataset collected via a novel memory-game procedure to obtain reliable memorability scores. It evaluates both handcrafted, interpretable features and end-to-end mel-spectrogram–based approaches, with a strong emphasis on interpretability using SHAP. The experiments show that explainable handcrafted features (EHC) paired with SVR/MLP provide the best correlations under limited data, while self-supervised transformers benefit from pitch-shift augmentation. The dataset and baselines enable systematic study of which musical attributes drive memorability, with potential applications in recommendation and style transfer, and the work highlights the importance of data efficiency and interpretability in MUSIC memorability research.

Abstract

Nowadays, humans are constantly exposed to music, whether through voluntary streaming services or incidental encounters during commercial breaks. Despite the abundance of music, certain pieces remain more memorable and often gain greater popularity. Inspired by this phenomenon, we focus on measuring and predicting music memorability. To achieve this, we collect a new music piece dataset with reliable memorability labels using a novel interactive experimental procedure. We then train baselines to predict and analyze music memorability, leveraging both interpretable features and audio mel-spectrograms as inputs. To the best of our knowledge, we are the first to explore music memorability using data-driven deep learning-based methods. Through a series of experiments and ablation studies, we demonstrate that while there is room for improvement, predicting music memorability with limited data is possible. Certain intrinsic elements, such as higher valence, arousal, and faster tempo, contribute to memorable music. As prediction techniques continue to evolve, real-life applications like music recommendation systems and music style transfer will undoubtedly benefit from this new area of research.
Paper Structure (15 sections, 1 equation, 6 figures, 4 tables)

This paper contains 15 sections, 1 equation, 6 figures, 4 tables.

Figures (6)

  • Figure 1: The music memory game, which allows data annotators to label music memorability scores reliably. The experiment is divided into three stages, each with a 3-minute long break in between. Each 18-minute stage is composed of multiple 5-second music pieces and short breaks.
  • Figure 2: Distributions of the audio published location and the distributions of the audio views in the final dataset.
  • Figure 3: Memorability scores at various stages. The color symbolizes the rank of short-term memorability, while the lines represent stage relationships. The plot also shows Spearman's rank correlations $\rho$ between memorabilities measured at each stage.
  • Figure 4: Relations between memorability score and target repeat interval in log scale. The hue represents the level of fatigue.
  • Figure 5: SHAP summary of the SVR model with RBF kernel rbf1705021. The most important features are listed in decreasing order and the fact that feature value rises after the SHAP value shows a positive relationship between the two.
  • ...and 1 more figures