Table of Contents
Fetching ...

Progressive Rock Music Classification

Arpan Nagar, Joseph Bensabat, Jokent Gaza, Moinak Dey

TL;DR

This work tackles prog rock genre classification within Music Information Retrieval by comparing a spectrum of models, from ensemble methods and 1D CNNs (Zuck and Satya) to a state-of-the-art Audio Spectrogram Transformer (AST). It relies on rich audio features (spectrograms, MFCCs, chromagrams, beat positions) extracted with Librosa and aggregated via a winner-take-all voting scheme across song snippets. Key findings show strong validation performance for bagging ensembles (e.g., Random Forest ~95.5%) and competitive test results for ExtraTrees (~76.38%), while the AST achieves solid recall and about $79.4\%$ overall test accuracy; Zuck and Satya offer distinct precision-recall profiles suited to different deployment needs. The results highlight the potential of ensemble methods and transformers for nuanced genre tasks, while also pointing to label ambiguities in post-prog subgenres and opportunities for future work in CNN/Transformer hybrids and semi-supervised learning.

Abstract

This study investigates the classification of progressive rock music, a genre characterized by complex compositions and diverse instrumentation, distinct from other musical styles. Addressing this Music Information Retrieval (MIR) task, we extracted comprehensive audio features, including spectrograms, Mel-Frequency Cepstral Coefficients (MFCCs), chromagrams, and beat positions from song snippets using the Librosa library. A winner-take-all voting strategy was employed to aggregate snippet-level predictions into final song classifications. We conducted a comparative analysis of various machine learning techniques. Ensemble methods, encompassing Bagging (Random Forest, ExtraTrees, Bagging Classifier) and Boosting (XGBoost, Gradient Boosting), were explored, utilizing Principal Component Analysis (PCA) for dimensionality reduction to manage computational constraints with high-dimensional feature sets. Additionally, deep learning approaches were investigated, including the development of custom 1D Convolutional Neural Network (1D CNN) architectures (named "Zuck" and "Satya") featuring specific layer configurations, normalization, and activation functions. Furthermore, we fine-tuned a state-of-the-art Audio Spectrogram Transformer (AST) model, leveraging its attention-based mechanisms for audio classification. Performance evaluation on validation and test sets revealed varying effectiveness across models, with ensemble methods like Extra Trees achieving test accuracies up to 76.38%. This research provides insights into the application and relative performance of diverse machine learning paradigms for the nuanced task of progressive rock genre classification.

Progressive Rock Music Classification

TL;DR

This work tackles prog rock genre classification within Music Information Retrieval by comparing a spectrum of models, from ensemble methods and 1D CNNs (Zuck and Satya) to a state-of-the-art Audio Spectrogram Transformer (AST). It relies on rich audio features (spectrograms, MFCCs, chromagrams, beat positions) extracted with Librosa and aggregated via a winner-take-all voting scheme across song snippets. Key findings show strong validation performance for bagging ensembles (e.g., Random Forest ~95.5%) and competitive test results for ExtraTrees (~76.38%), while the AST achieves solid recall and about overall test accuracy; Zuck and Satya offer distinct precision-recall profiles suited to different deployment needs. The results highlight the potential of ensemble methods and transformers for nuanced genre tasks, while also pointing to label ambiguities in post-prog subgenres and opportunities for future work in CNN/Transformer hybrids and semi-supervised learning.

Abstract

This study investigates the classification of progressive rock music, a genre characterized by complex compositions and diverse instrumentation, distinct from other musical styles. Addressing this Music Information Retrieval (MIR) task, we extracted comprehensive audio features, including spectrograms, Mel-Frequency Cepstral Coefficients (MFCCs), chromagrams, and beat positions from song snippets using the Librosa library. A winner-take-all voting strategy was employed to aggregate snippet-level predictions into final song classifications. We conducted a comparative analysis of various machine learning techniques. Ensemble methods, encompassing Bagging (Random Forest, ExtraTrees, Bagging Classifier) and Boosting (XGBoost, Gradient Boosting), were explored, utilizing Principal Component Analysis (PCA) for dimensionality reduction to manage computational constraints with high-dimensional feature sets. Additionally, deep learning approaches were investigated, including the development of custom 1D Convolutional Neural Network (1D CNN) architectures (named "Zuck" and "Satya") featuring specific layer configurations, normalization, and activation functions. Furthermore, we fine-tuned a state-of-the-art Audio Spectrogram Transformer (AST) model, leveraging its attention-based mechanisms for audio classification. Performance evaluation on validation and test sets revealed varying effectiveness across models, with ensemble methods like Extra Trees achieving test accuracies up to 76.38%. This research provides insights into the application and relative performance of diverse machine learning paradigms for the nuanced task of progressive rock genre classification.

Paper Structure

This paper contains 27 sections, 15 figures, 8 tables.

Figures (15)

  • Figure 1: Visualizing sequence input (i) Spectrogram (ii) MFCCs (iii) Chromagram (iv) Beat Position for a 10 second snippet of Another One Bites the Dust by Queen
  • Figure 2: Classifying 50 Snippets of Toxicological Whispering by Amon Düül II using our winner-take-all voting strategy
  • Figure 3: Classifying 50 Snippets of All the Stars by Kendrick Lamar & SZA using our winner-take-all voting strategy
  • Figure 4: Flow chart describing how we've combined PCA with ensemble methods
  • Figure 5: Random Forest feature importance out of 34560 features
  • ...and 10 more figures