Convolutional Neural Network Achieves Human-level Accuracy in Music Genre Classification

Mingwen Dong

Convolutional Neural Network Achieves Human-level Accuracy in Music Genre Classification

Mingwen Dong

TL;DR

This paper addresses automatic music genre classification by training a convolutional neural network on log-mel spectrogram segments and aggregating segment-level predictions to classify entire tracks. The approach integrates psychophysics and neurophysiology, aiming for STRF-like filters and improved discrimination through 3-second segment analysis. It achieves human-level accuracy (~70%) on 10 genres, outperforming prior methods, and demonstrates that learned features remap the input into a linearly separable representation. The work suggests practical impact for music recommendation and broader MIR tasks while supporting biological plausibility of learned spectral-temporal features.

Abstract

Music genre classification is one example of content-based analysis of music signals. Traditionally, human-engineered features were used to automatize this task and 61% accuracy has been achieved in the 10-genre classification. However, it's still below the 70% accuracy that humans could achieve in the same task. Here, we propose a new method that combines knowledge of human perception study in music genre classification and the neurophysiology of the auditory system. The method works by training a simple convolutional neural network (CNN) to classify a short segment of the music signal. Then, the genre of a music is determined by splitting it into short segments and then combining CNN's predictions from all short segments. After training, this method achieves human-level (70%) accuracy and the filters learned in the CNN resemble the spectrotemporal receptive field (STRF) in the auditory system.

Convolutional Neural Network Achieves Human-level Accuracy in Music Genre Classification

TL;DR

Abstract

Convolutional Neural Network Achieves Human-level Accuracy in Music Genre Classification

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)