RF-GML: Reference-Free Generative Machine Listener
Arijit Biswas, Guanxin Jiang
TL;DR
RF-GML tackles the lack of reliable reference-free metrics for high-fidelity coded audio by modeling listening scores with a two-parameter logistic distribution, characterized by mean $\mu$ and scale $a$. It adapts a state-of-the-art full-reference Generative Machine Listener (GML) through selective weight transfer to form an RF-capable model trained on per-listener MUSHRA scores at 48 kHz. The approach is validated against MPEG-USAC test sets and internal binaural tests, showing strong correlation with subjective ratings and improved handling of unencoded audio compared with prior RF metrics. This work enables robust, reference-free quality monitoring suitable for streaming, archiving, and codec development.
Abstract
This paper introduces a novel reference-free (RF) audio quality metric called the RF-Generative Machine Listener (RF-GML), designed to evaluate coded mono, stereo, and binaural audio at a 48 kHz sample rate. RF-GML leverages transfer learning from a state-of-the-art full-reference (FR) Generative Machine Listener (GML) with minimal architectural modifications. The term "generative" refers to the model's ability to generate an arbitrary number of simulated listening scores. Unlike existing RF models, RF-GML accurately predicts subjective quality scores across diverse content types and codecs. Extensive evaluations demonstrate its superiority in rating unencoded audio and distinguishing different levels of coding artifacts. RF-GML's performance and versatility make it a valuable tool for coded audio quality assessment and monitoring in various applications, all without the need for a reference signal.
