Music Genre Classification: Training an AI model
Keoikantse Mogonediwa
TL;DR
The paper investigates music genre classification by comparing four algorithms—Multilayer Perceptron (MLP), Convolutional Neural Network (CNN), K-Nearest Neighbours (KNN), and a Random Forest wide model—using features extracted from audio signals via Short-Time Fourier Transform and MFCCs on the GTZAN dataset. It finds that the Random Forest approach yields the highest accuracy (~84%), while CNN, MLP, and KNN show substantially lower performance under the reported configurations. The study discusses data quality issues, including a couple of corrupted jazz files and potential class imbalance, and emphasizes the role of STFT-based features in enabling effective classification. The results have practical implications for building robust, audio-based genre classifiers in streaming and music information retrieval systems.
Abstract
Music genre classification is an area that utilizes machine learning models and techniques for the processing of audio signals, in which applications range from content recommendation systems to music recommendation systems. In this research I explore various machine learning algorithms for the purpose of music genre classification, using features extracted from audio signals.The systems are namely, a Multilayer Perceptron (built from scratch), a k-Nearest Neighbours (also built from scratch), a Convolutional Neural Network and lastly a Random Forest wide model. In order to process the audio signals, feature extraction methods such as Short-Time Fourier Transform, and the extraction of Mel Cepstral Coefficients (MFCCs), is performed. Through this extensive research, I aim to asses the robustness of machine learning models for genre classification, and to compare their results.
