Accompaniment Prompt Adherence: A Measure for Evaluating Music Accompaniment Systems
Maarten Grachten, Javier Nistal
TL;DR
The paper addresses the lack of standardized metrics for evaluating how well a generated accompaniment stem adheres to a given musical context prompt. It introduces Accompaniment Prompt Adherence (APA), a distribution-based metric built on Fréchet Audio Distance and powered by pre-trained embeddings like CLAP, defined as $APA = rac{1}{2} + rac{FAD_{C,R'} - FAD_{C,R}}{2 \, FAD_{R,R'}}$, clipped to $[0,1]$, to quantify adherence without training. APA is validated through objective perturbations and subjective listening tests, showing alignment with human judgments and sensitivity to degradations, and is implemented in an open-source Python package. The work demonstrates the practical utility of APA for evaluating and comparing music accompaniment generation systems across diverse datasets and embedding configurations.
Abstract
Generative systems of musical accompaniments are rapidly growing, yet there are no standardized metrics to evaluate how well generations align with the conditional audio prompt. We introduce a distribution-based measure called "Accompaniment Prompt Adherence" (APA), and validate it through objective experiments on synthetic data perturbations, and human listening tests. Results show that APA aligns well with human judgments of adherence and is discriminative to transformations that degrade adherence. We release a Python implementation of the metric using the widely adopted pre-trained CLAP embedding model, offering a valuable tool for evaluating and comparing accompaniment generation systems.
