Progressive Rock Music Classification
Arpan Nagar, Joseph Bensabat, Jokent Gaza, Moinak Dey
TL;DR
This work tackles prog rock genre classification within Music Information Retrieval by comparing a spectrum of models, from ensemble methods and 1D CNNs (Zuck and Satya) to a state-of-the-art Audio Spectrogram Transformer (AST). It relies on rich audio features (spectrograms, MFCCs, chromagrams, beat positions) extracted with Librosa and aggregated via a winner-take-all voting scheme across song snippets. Key findings show strong validation performance for bagging ensembles (e.g., Random Forest ~95.5%) and competitive test results for ExtraTrees (~76.38%), while the AST achieves solid recall and about $79.4\%$ overall test accuracy; Zuck and Satya offer distinct precision-recall profiles suited to different deployment needs. The results highlight the potential of ensemble methods and transformers for nuanced genre tasks, while also pointing to label ambiguities in post-prog subgenres and opportunities for future work in CNN/Transformer hybrids and semi-supervised learning.
Abstract
This study investigates the classification of progressive rock music, a genre characterized by complex compositions and diverse instrumentation, distinct from other musical styles. Addressing this Music Information Retrieval (MIR) task, we extracted comprehensive audio features, including spectrograms, Mel-Frequency Cepstral Coefficients (MFCCs), chromagrams, and beat positions from song snippets using the Librosa library. A winner-take-all voting strategy was employed to aggregate snippet-level predictions into final song classifications. We conducted a comparative analysis of various machine learning techniques. Ensemble methods, encompassing Bagging (Random Forest, ExtraTrees, Bagging Classifier) and Boosting (XGBoost, Gradient Boosting), were explored, utilizing Principal Component Analysis (PCA) for dimensionality reduction to manage computational constraints with high-dimensional feature sets. Additionally, deep learning approaches were investigated, including the development of custom 1D Convolutional Neural Network (1D CNN) architectures (named "Zuck" and "Satya") featuring specific layer configurations, normalization, and activation functions. Furthermore, we fine-tuned a state-of-the-art Audio Spectrogram Transformer (AST) model, leveraging its attention-based mechanisms for audio classification. Performance evaluation on validation and test sets revealed varying effectiveness across models, with ensemble methods like Extra Trees achieving test accuracies up to 76.38%. This research provides insights into the application and relative performance of diverse machine learning paradigms for the nuanced task of progressive rock genre classification.
