Exploring Perceptual Audio Quality Measurement on Stereo Processing Using the Open Dataset of Audio Quality
Pablo M. Delgado, Sascha Dick, Christoph Thompson, Chih-Wei Wu, Phillip A. Williams
TL;DR
This work analyzes the Open Dataset of Audio Quality (ODAQ) update focused on stereo processing (Mid/Side and Left/Right) to evaluate how existing objective metrics predict subjective quality across monaural and binaural distortions. It compares a range of timbre- and spatial-quality metrics, including MoBi-Q, eMoBi-Q, PEMO-Q, and binaural extensions of PEAQ, using MUSHRA tests and careful pre-processing to reveal strengths and limitations in predicting stereo degradation. The findings show timbre-based metrics often predict well in simple contexts, while binaural metrics offer limited gains and can be confounded by presentation context and hard-panned artifacts, highlighting the need for models that jointly capture bottom-up psychoacoustics and top-down listening factors. The results guide future metric development toward robust integration of timbral and spatial cues, with data-driven handling of context to improve perceptual quality prediction in realistic stereo scenarios.
Abstract
ODAQ (Open Dataset of Audio Quality) provides a comprehensive framework for exploring both monaural and binaural audio quality degradations across a range of distortion classes and signals, accompanied by subjective quality ratings. A recent update of ODAQ, focusing on the impact of stereo processing methods such as Mid/Side (MS) and Left/Right (LR), provides test signals and subjective ratings for the in-depth investigation of state-of-the-art objective audio quality metrics. Our evaluation results suggest that, while timbre-focused metrics often yield robust results under simpler conditions, their prediction performance tends to suffer under the conditions with a more complex presentation context. Our findings underscore the importance of modeling the interplay of bottom-up psychoacoustic processes and top-down contextual factors, guiding future research toward models that more effectively integrate both timbral and spatial dimensions of perceived audio quality.
