Frechet Music Distance: A Metric For Generative Symbolic Music Evaluation
Jan Retkowski, Jakub Stępniak, Mateusz Modrzejewski
TL;DR
The paper tackles the absence of a domain-specific, objective metric for evaluating symbolic music generation. It introduces Frechet Music Distance (FMD), a distributional distance between embeddings of reference and generated symbolic music, built on Frechet-distance concepts and leveraging advanced symbolic-music encoders. The authors validate FMD across diverse datasets (e.g., MAESTRO, MidiCaps) and generative models (MMT, FolkRNN, GPT-2), and provide a Python toolkit to compute FMD variants for different modalities (MIDI and ABC). They also explore robustness aspects, including outlier detection and estimation methods, and discuss limitations such as embedding biases and preprocessing sensitivity, outlining future directions to solidify FMD as a reproducible standard for symbolic-music evaluation.
Abstract
In this paper we introduce the Frechet Music Distance (FMD), a novel evaluation metric for generative symbolic music models, inspired by the Frechet Inception Distance (FID) in computer vision and Frechet Audio Distance (FAD) in generative audio. FMD calculates the distance between distributions of reference and generated symbolic music embeddings, capturing abstract musical features. We validate FMD across several datasets and models. Results indicate that FMD effectively differentiates model quality, providing a domain-specific metric for evaluating symbolic music generation, and establishing a reproducible standard for future research in symbolic music modeling.
