Table of Contents
Fetching ...

Can pre-trained Deep Learning models predict groove ratings?

Axel Marmoret, Nicolas Farrugia, Jan Alexander Stupacher

Abstract

This study explores the extent to which deep learning models can predict groove and its related perceptual dimensions directly from audio signals. We critically examine the effectiveness of seven state-of-the-art deep learning models in predicting groove ratings and responses to groove-related queries through the extraction of audio embeddings. Additionally, we compare these predictions with traditional handcrafted audio features. To better understand the underlying mechanics, we extend this methodology to analyze predictions based on source-separated instruments, thereby isolating the contributions of individual musical elements. Our analysis reveals a clear separation of groove characteristics driven by the underlying musical style of the tracks (funk, pop, and rock). These findings indicate that deep audio representations can successfully encode complex, style-dependent groove components that traditional features often miss. Ultimately, this work highlights the capacity of advanced deep learning models to capture the multifaceted concept of groove, demonstrating the strong potential of representation learning to advance predictive Music Information Retrieval methodologies.

Can pre-trained Deep Learning models predict groove ratings?

Abstract

This study explores the extent to which deep learning models can predict groove and its related perceptual dimensions directly from audio signals. We critically examine the effectiveness of seven state-of-the-art deep learning models in predicting groove ratings and responses to groove-related queries through the extraction of audio embeddings. Additionally, we compare these predictions with traditional handcrafted audio features. To better understand the underlying mechanics, we extend this methodology to analyze predictions based on source-separated instruments, thereby isolating the contributions of individual musical elements. Our analysis reveals a clear separation of groove characteristics driven by the underlying musical style of the tracks (funk, pop, and rock). These findings indicate that deep audio representations can successfully encode complex, style-dependent groove components that traditional features often miss. Ultimately, this work highlights the capacity of advanced deep learning models to capture the multifaceted concept of groove, demonstrating the strong potential of representation learning to advance predictive Music Information Retrieval methodologies.

Paper Structure

This paper contains 18 sections, 2 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Repartition of the groove ratings for all 148 songs. For each song, results are averaged across all participants.
  • Figure 2: Scatter plots of the predicted values against the ground truth values, for all ratings, and for the MuQ embeddings.
  • Figure 3: Visualization of the projection on the first two principal components (PCA) of audio embeddings from the MuQ model derived from source-separated instrumental stems