Table of Contents
Fetching ...

Frechet Music Distance: A Metric For Generative Symbolic Music Evaluation

Jan Retkowski, Jakub Stępniak, Mateusz Modrzejewski

TL;DR

The paper tackles the absence of a domain-specific, objective metric for evaluating symbolic music generation. It introduces Frechet Music Distance (FMD), a distributional distance between embeddings of reference and generated symbolic music, built on Frechet-distance concepts and leveraging advanced symbolic-music encoders. The authors validate FMD across diverse datasets (e.g., MAESTRO, MidiCaps) and generative models (MMT, FolkRNN, GPT-2), and provide a Python toolkit to compute FMD variants for different modalities (MIDI and ABC). They also explore robustness aspects, including outlier detection and estimation methods, and discuss limitations such as embedding biases and preprocessing sensitivity, outlining future directions to solidify FMD as a reproducible standard for symbolic-music evaluation.

Abstract

In this paper we introduce the Frechet Music Distance (FMD), a novel evaluation metric for generative symbolic music models, inspired by the Frechet Inception Distance (FID) in computer vision and Frechet Audio Distance (FAD) in generative audio. FMD calculates the distance between distributions of reference and generated symbolic music embeddings, capturing abstract musical features. We validate FMD across several datasets and models. Results indicate that FMD effectively differentiates model quality, providing a domain-specific metric for evaluating symbolic music generation, and establishing a reproducible standard for future research in symbolic music modeling.

Frechet Music Distance: A Metric For Generative Symbolic Music Evaluation

TL;DR

The paper tackles the absence of a domain-specific, objective metric for evaluating symbolic music generation. It introduces Frechet Music Distance (FMD), a distributional distance between embeddings of reference and generated symbolic music, built on Frechet-distance concepts and leveraging advanced symbolic-music encoders. The authors validate FMD across diverse datasets (e.g., MAESTRO, MidiCaps) and generative models (MMT, FolkRNN, GPT-2), and provide a Python toolkit to compute FMD variants for different modalities (MIDI and ABC). They also explore robustness aspects, including outlier detection and estimation methods, and discuss limitations such as embedding biases and preprocessing sensitivity, outlining future directions to solidify FMD as a reproducible standard for symbolic-music evaluation.

Abstract

In this paper we introduce the Frechet Music Distance (FMD), a novel evaluation metric for generative symbolic music models, inspired by the Frechet Inception Distance (FID) in computer vision and Frechet Audio Distance (FAD) in generative audio. FMD calculates the distance between distributions of reference and generated symbolic music embeddings, capturing abstract musical features. We validate FMD across several datasets and models. Results indicate that FMD effectively differentiates model quality, providing a domain-specific metric for evaluating symbolic music generation, and establishing a reproducible standard for future research in symbolic music modeling.

Paper Structure

This paper contains 15 sections, 1 equation, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Schematic overview of Frechet Music Distance computation between multivariate Gaussians estimated on the embeddings of generated symbolic music and embeddings of a reference set of music.
  • Figure 2: FMD for GPT-2, ABC evaluation.
  • Figure 3: FMD for GPT-2, MIDI evaluation.
  • Figure 4: Distribution of per-song FMD between MAESTRO and Midicaps classical piano
  • Figure 5: FMD errors for different mean and covariance estimation methods computed on combinations of MidiCaps subsets.
  • ...and 1 more figures