Table of Contents
Fetching ...

Music Genre Classification: Training an AI model

Keoikantse Mogonediwa

TL;DR

The paper investigates music genre classification by comparing four algorithms—Multilayer Perceptron (MLP), Convolutional Neural Network (CNN), K-Nearest Neighbours (KNN), and a Random Forest wide model—using features extracted from audio signals via Short-Time Fourier Transform and MFCCs on the GTZAN dataset. It finds that the Random Forest approach yields the highest accuracy (~84%), while CNN, MLP, and KNN show substantially lower performance under the reported configurations. The study discusses data quality issues, including a couple of corrupted jazz files and potential class imbalance, and emphasizes the role of STFT-based features in enabling effective classification. The results have practical implications for building robust, audio-based genre classifiers in streaming and music information retrieval systems.

Abstract

Music genre classification is an area that utilizes machine learning models and techniques for the processing of audio signals, in which applications range from content recommendation systems to music recommendation systems. In this research I explore various machine learning algorithms for the purpose of music genre classification, using features extracted from audio signals.The systems are namely, a Multilayer Perceptron (built from scratch), a k-Nearest Neighbours (also built from scratch), a Convolutional Neural Network and lastly a Random Forest wide model. In order to process the audio signals, feature extraction methods such as Short-Time Fourier Transform, and the extraction of Mel Cepstral Coefficients (MFCCs), is performed. Through this extensive research, I aim to asses the robustness of machine learning models for genre classification, and to compare their results.

Music Genre Classification: Training an AI model

TL;DR

The paper investigates music genre classification by comparing four algorithms—Multilayer Perceptron (MLP), Convolutional Neural Network (CNN), K-Nearest Neighbours (KNN), and a Random Forest wide model—using features extracted from audio signals via Short-Time Fourier Transform and MFCCs on the GTZAN dataset. It finds that the Random Forest approach yields the highest accuracy (~84%), while CNN, MLP, and KNN show substantially lower performance under the reported configurations. The study discusses data quality issues, including a couple of corrupted jazz files and potential class imbalance, and emphasizes the role of STFT-based features in enabling effective classification. The results have practical implications for building robust, audio-based genre classifiers in streaming and music information retrieval systems.

Abstract

Music genre classification is an area that utilizes machine learning models and techniques for the processing of audio signals, in which applications range from content recommendation systems to music recommendation systems. In this research I explore various machine learning algorithms for the purpose of music genre classification, using features extracted from audio signals.The systems are namely, a Multilayer Perceptron (built from scratch), a k-Nearest Neighbours (also built from scratch), a Convolutional Neural Network and lastly a Random Forest wide model. In order to process the audio signals, feature extraction methods such as Short-Time Fourier Transform, and the extraction of Mel Cepstral Coefficients (MFCCs), is performed. Through this extensive research, I aim to asses the robustness of machine learning models for genre classification, and to compare their results.
Paper Structure (15 sections, 10 figures)

This paper contains 15 sections, 10 figures.

Figures (10)

  • Figure 1: Typical flowchart demonstrating the steps involved in music genre classification
  • Figure 2: Metadata information retrieved from the dataset.
  • Figure 3: STFT of a randomly selected Reggae audio file
  • Figure 4: STFT of a randomly selected Reggae audio file in which padding has been applied
  • Figure 5: Spectrogram of the selected reggae genre audio file
  • ...and 5 more figures