Table of Contents
Fetching ...

The Effect of Perceptual Metrics on Music Representation Learning for Genre Classification

Tashi Namgyal, Alexander Hepburn, Raul Santos-Rodriguez, Valero Laparra, Jesus Malo

TL;DR

It is demonstrated that using features extracted from autoencoders trained with perceptual losses can improve performance on music understanding tasks, i.e. genre classification, over using these metrics directly as distances when learning a classifier.

Abstract

The subjective quality of natural signals can be approximated with objective perceptual metrics. Designed to approximate the perceptual behaviour of human observers, perceptual metrics often reflect structures found in natural signals and neurological pathways. Models trained with perceptual metrics as loss functions can capture perceptually meaningful features from the structures held within these metrics. We demonstrate that using features extracted from autoencoders trained with perceptual losses can improve performance on music understanding tasks, i.e. genre classification, over using these metrics directly as distances when learning a classifier. This result suggests improved generalisation to novel signals when using perceptual metrics as loss functions for representation learning.

The Effect of Perceptual Metrics on Music Representation Learning for Genre Classification

TL;DR

It is demonstrated that using features extracted from autoencoders trained with perceptual losses can improve performance on music understanding tasks, i.e. genre classification, over using these metrics directly as distances when learning a classifier.

Abstract

The subjective quality of natural signals can be approximated with objective perceptual metrics. Designed to approximate the perceptual behaviour of human observers, perceptual metrics often reflect structures found in natural signals and neurological pathways. Models trained with perceptual metrics as loss functions can capture perceptually meaningful features from the structures held within these metrics. We demonstrate that using features extracted from autoencoders trained with perceptual losses can improve performance on music understanding tasks, i.e. genre classification, over using these metrics directly as distances when learning a classifier. This result suggests improved generalisation to novel signals when using perceptual metrics as loss functions for representation learning.
Paper Structure (7 sections, 4 figures, 2 tables)

This paper contains 7 sections, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Violin Plot showing the distribution of pairwise distances between songs for each genre in the filtered GTZAN dataset. MSE, $1-\text{MS-SSIM}$ & NLPD have different distributions. The 'all' column shows pairwise distances across the whole dataset and each genre column shows pairwise differences within that genre. Within genre distributions are more spread out for NLPD than MSE. This effect is detrimental to clustering, but advantageous to reconstruction, where small perceptual differences need to be highlighted to increase their impact on the loss for improved learning.
  • Figure 2: Weighted F1 score on the validation set for KNN classifiers using MSE, $1-\text{MS-SSIM}$ & NLPD as distances between neighbours. Squares show the number of neighbours, $k$, chosen for each model.
  • Figure 3: Confusion Matrices for KNN classifiers using MSE, NLPD and 1 - MS-SSIM as distances between neighbours.
  • Figure 4: Confusion Matrices for Logistic Regression classifiers using latent features from autoencoders trained on uniform noise with MSE, NLPD and 1 - MS-SSIM distances as losses.