Table of Contents
Fetching ...

Exploring Perceptual Audio Quality Measurement on Stereo Processing Using the Open Dataset of Audio Quality

Pablo M. Delgado, Sascha Dick, Christoph Thompson, Chih-Wei Wu, Phillip A. Williams

TL;DR

This work analyzes the Open Dataset of Audio Quality (ODAQ) update focused on stereo processing (Mid/Side and Left/Right) to evaluate how existing objective metrics predict subjective quality across monaural and binaural distortions. It compares a range of timbre- and spatial-quality metrics, including MoBi-Q, eMoBi-Q, PEMO-Q, and binaural extensions of PEAQ, using MUSHRA tests and careful pre-processing to reveal strengths and limitations in predicting stereo degradation. The findings show timbre-based metrics often predict well in simple contexts, while binaural metrics offer limited gains and can be confounded by presentation context and hard-panned artifacts, highlighting the need for models that jointly capture bottom-up psychoacoustics and top-down listening factors. The results guide future metric development toward robust integration of timbral and spatial cues, with data-driven handling of context to improve perceptual quality prediction in realistic stereo scenarios.

Abstract

ODAQ (Open Dataset of Audio Quality) provides a comprehensive framework for exploring both monaural and binaural audio quality degradations across a range of distortion classes and signals, accompanied by subjective quality ratings. A recent update of ODAQ, focusing on the impact of stereo processing methods such as Mid/Side (MS) and Left/Right (LR), provides test signals and subjective ratings for the in-depth investigation of state-of-the-art objective audio quality metrics. Our evaluation results suggest that, while timbre-focused metrics often yield robust results under simpler conditions, their prediction performance tends to suffer under the conditions with a more complex presentation context. Our findings underscore the importance of modeling the interplay of bottom-up psychoacoustic processes and top-down contextual factors, guiding future research toward models that more effectively integrate both timbral and spatial dimensions of perceived audio quality.

Exploring Perceptual Audio Quality Measurement on Stereo Processing Using the Open Dataset of Audio Quality

TL;DR

This work analyzes the Open Dataset of Audio Quality (ODAQ) update focused on stereo processing (Mid/Side and Left/Right) to evaluate how existing objective metrics predict subjective quality across monaural and binaural distortions. It compares a range of timbre- and spatial-quality metrics, including MoBi-Q, eMoBi-Q, PEMO-Q, and binaural extensions of PEAQ, using MUSHRA tests and careful pre-processing to reveal strengths and limitations in predicting stereo degradation. The findings show timbre-based metrics often predict well in simple contexts, while binaural metrics offer limited gains and can be confounded by presentation context and hard-panned artifacts, highlighting the need for models that jointly capture bottom-up psychoacoustics and top-down listening factors. The results guide future metric development toward robust integration of timbral and spatial cues, with data-driven handling of context to improve perceptual quality prediction in realistic stereo scenarios.

Abstract

ODAQ (Open Dataset of Audio Quality) provides a comprehensive framework for exploring both monaural and binaural audio quality degradations across a range of distortion classes and signals, accompanied by subjective quality ratings. A recent update of ODAQ, focusing on the impact of stereo processing methods such as Mid/Side (MS) and Left/Right (LR), provides test signals and subjective ratings for the in-depth investigation of state-of-the-art objective audio quality metrics. Our evaluation results suggest that, while timbre-focused metrics often yield robust results under simpler conditions, their prediction performance tends to suffer under the conditions with a more complex presentation context. Our findings underscore the importance of modeling the interplay of bottom-up psychoacoustic processes and top-down contextual factors, guiding future research toward models that more effectively integrate both timbral and spatial dimensions of perceived audio quality.

Paper Structure

This paper contains 21 sections, 1 equation, 7 figures, 1 table.

Figures (7)

  • Figure 1: Experimental setup for the generation and quality assessment of the degraded audio files.
  • Figure 2: Experimental conditions for the degraded files. In the MUSHRA test, listeners were presented with LR-only degradations (applied independently to the left and right channels), MS-only degradations (applied to the mid and side channels), or a mixture of both.
  • Figure 3: Correlation between objective metric predictions and subjective quality scores for each experiment, taking into account artifact type and presentation context. $\hbox{CI}_{95\%} \leq \pm 0.01$ for all estimates.
  • Figure 4: Correlation between objective metric predictions and subjective quality scores for each audio excerpt and their respective treatments. Audio excerpts without hard-panned auditory objects. $\hbox{CI}_{95\%} \leq \pm 0.01$ for all estimates.
  • Figure 5: Correlation between objective metric predictions and subjective quality scores for each audio excerpt and their respective treatments. Audio excerpts with hard-panned auditory objects. $\hbox{CI}_{95\%} \leq \pm 0.01$ for all estimates.
  • ...and 2 more figures