Table of Contents
Fetching ...

RF-GML: Reference-Free Generative Machine Listener

Arijit Biswas, Guanxin Jiang

TL;DR

RF-GML tackles the lack of reliable reference-free metrics for high-fidelity coded audio by modeling listening scores with a two-parameter logistic distribution, characterized by mean $\mu$ and scale $a$. It adapts a state-of-the-art full-reference Generative Machine Listener (GML) through selective weight transfer to form an RF-capable model trained on per-listener MUSHRA scores at 48 kHz. The approach is validated against MPEG-USAC test sets and internal binaural tests, showing strong correlation with subjective ratings and improved handling of unencoded audio compared with prior RF metrics. This work enables robust, reference-free quality monitoring suitable for streaming, archiving, and codec development.

Abstract

This paper introduces a novel reference-free (RF) audio quality metric called the RF-Generative Machine Listener (RF-GML), designed to evaluate coded mono, stereo, and binaural audio at a 48 kHz sample rate. RF-GML leverages transfer learning from a state-of-the-art full-reference (FR) Generative Machine Listener (GML) with minimal architectural modifications. The term "generative" refers to the model's ability to generate an arbitrary number of simulated listening scores. Unlike existing RF models, RF-GML accurately predicts subjective quality scores across diverse content types and codecs. Extensive evaluations demonstrate its superiority in rating unencoded audio and distinguishing different levels of coding artifacts. RF-GML's performance and versatility make it a valuable tool for coded audio quality assessment and monitoring in various applications, all without the need for a reference signal.

RF-GML: Reference-Free Generative Machine Listener

TL;DR

RF-GML tackles the lack of reliable reference-free metrics for high-fidelity coded audio by modeling listening scores with a two-parameter logistic distribution, characterized by mean and scale . It adapts a state-of-the-art full-reference Generative Machine Listener (GML) through selective weight transfer to form an RF-capable model trained on per-listener MUSHRA scores at 48 kHz. The approach is validated against MPEG-USAC test sets and internal binaural tests, showing strong correlation with subjective ratings and improved handling of unencoded audio compared with prior RF metrics. This work enables robust, reference-free quality monitoring suitable for streaming, archiving, and codec development.

Abstract

This paper introduces a novel reference-free (RF) audio quality metric called the RF-Generative Machine Listener (RF-GML), designed to evaluate coded mono, stereo, and binaural audio at a 48 kHz sample rate. RF-GML leverages transfer learning from a state-of-the-art full-reference (FR) Generative Machine Listener (GML) with minimal architectural modifications. The term "generative" refers to the model's ability to generate an arbitrary number of simulated listening scores. Unlike existing RF models, RF-GML accurately predicts subjective quality scores across diverse content types and codecs. Extensive evaluations demonstrate its superiority in rating unencoded audio and distinguishing different levels of coding artifacts. RF-GML's performance and versatility make it a valuable tool for coded audio quality assessment and monitoring in various applications, all without the need for a reference signal.
Paper Structure (10 sections, 3 equations, 3 figures, 1 table)

This paper contains 10 sections, 3 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: NLL training losses of GML and RF-GML flavors.
  • Figure 2: Scaling of RF-GML scores with bitrates: 24 stereo excerpts usac_lt coded with HE-AAC (v1 or v2) and AAC.
  • Figure 3: Correlation of predicted quality scores versus audio bandwidth: -0.08 (SESQA), 0.51 (RF-GML (def)), 0.44 (RF-GML (deg)), with RF-GML (deg) rating unencoded audio closer to 100.