Table of Contents
Fetching ...

Quantifying Spatial Audio Quality Impairment

Karn N. Watcharasupat, Alexander Lerch

TL;DR

This work tackles objective geometric spatial-audio quality assessment by modeling multichannel test signals as spatially distorted projections of a reference using interchannel delays and gains. It introduces a duplex-theory–based decomposition that estimates an optimal low-tap multichannel filter, yielding two metrics, $SSR$ and $SRR$, defined as $SSR(\hat{\mathbf{s}};\mathbf{s}) = 10 \log_{10}\left(\dfrac{\|\mathbf{s}\|^2}{\|\mathbf{e}_{\text{spat}}\|^2}\right)$ and $SRR(\hat{\mathbf{s}};\mathbf{s}) = 10 \log_{10}\left(\dfrac{\|\tilde{\mathbf{s}}\|^2}{\|\mathbf{e}_{\text{resid}}\|^2}\right)$. The method supports framewise processing with a 2 s window and 50% overlap, and is demonstrated to be robust to common degradations such as codec compression and music-source-separation, validated on both synthetic and real multichannel bed–object scenes. An open-source Python implementation is provided, offering a practical, dataset-independent tool for quantifying spatial impairment in diverse multichannel configurations.

Abstract

Spatial audio quality is a highly multifaceted concept, with many interactions between environmental, geometrical, anatomical, psychological, and contextual considerations. Methods for characterization or evaluation of the geometrical components of spatial audio quality, however, remain scarce, despite being perhaps the least subjective aspect of spatial audio quality to quantify. By considering interchannel time and level differences relative to a reference signal, it is possible to construct a signal model to isolate some of the spatial distortion. By using a combination of least-square optimization and heuristics, we propose a signal decomposition method to isolate the spatial error from a processed signal, in terms of interchannel gain leakages and changes in relative delays. This allows the computation of simple energy-ratio metrics, providing objective measures of spatial and non-spatial signal qualities, with minimal assumptions and no dataset dependency. Experiments demonstrate the robustness of the method against common spatial signal degradation introduced by, e.g., audio compression and music source separation. Implementation is available at https://github.com/karnwatcharasupat/spauq.

Quantifying Spatial Audio Quality Impairment

TL;DR

This work tackles objective geometric spatial-audio quality assessment by modeling multichannel test signals as spatially distorted projections of a reference using interchannel delays and gains. It introduces a duplex-theory–based decomposition that estimates an optimal low-tap multichannel filter, yielding two metrics, and , defined as and . The method supports framewise processing with a 2 s window and 50% overlap, and is demonstrated to be robust to common degradations such as codec compression and music-source-separation, validated on both synthetic and real multichannel bed–object scenes. An open-source Python implementation is provided, offering a practical, dataset-independent tool for quantifying spatial impairment in diverse multichannel configurations.

Abstract

Spatial audio quality is a highly multifaceted concept, with many interactions between environmental, geometrical, anatomical, psychological, and contextual considerations. Methods for characterization or evaluation of the geometrical components of spatial audio quality, however, remain scarce, despite being perhaps the least subjective aspect of spatial audio quality to quantify. By considering interchannel time and level differences relative to a reference signal, it is possible to construct a signal model to isolate some of the spatial distortion. By using a combination of least-square optimization and heuristics, we propose a signal decomposition method to isolate the spatial error from a processed signal, in terms of interchannel gain leakages and changes in relative delays. This allows the computation of simple energy-ratio metrics, providing objective measures of spatial and non-spatial signal qualities, with minimal assumptions and no dataset dependency. Experiments demonstrate the robustness of the method against common spatial signal degradation introduced by, e.g., audio compression and music source separation. Implementation is available at https://github.com/karnwatcharasupat/spauq.
Paper Structure (17 sections, 7 equations, 4 figures)

This paper contains 17 sections, 7 equations, 4 figures.

Figures (4)

  • Figure 1: SSR and SRR of the test signals w.r.t. its panning and (a) reference signal panning; (b) right-channel delay; (c) cutoff frequency; (d) SNR. Circular markers are experimental values with the horizontal offsets for readability. (b) In the SSR plot, each dashdotted line connects the median values within a delay parameter; the gray area represents the theoretical range of the SSR. (c & d) Dotted lines are theoretical values.
  • Figure 2: SSR and SRR of the test signals w.r.t. its azimuthal locations and the object-to-bed energy ratio. Each circular marker represents the mean over frames and bed-object pairs; each vertical line is the 95% confidence interval of the mean.
  • Figure 3: Change in SSR and SRR of the test signals compressed by AAC, relative to the operating mode without joint encoding, by operating mode and average bitrates.
  • Figure 4: Evaluation results on the MUSDB18-HQ test set.