Quantifying Spatial Audio Quality Impairment
Karn N. Watcharasupat, Alexander Lerch
TL;DR
This work tackles objective geometric spatial-audio quality assessment by modeling multichannel test signals as spatially distorted projections of a reference using interchannel delays and gains. It introduces a duplex-theory–based decomposition that estimates an optimal low-tap multichannel filter, yielding two metrics, $SSR$ and $SRR$, defined as $SSR(\hat{\mathbf{s}};\mathbf{s}) = 10 \log_{10}\left(\dfrac{\|\mathbf{s}\|^2}{\|\mathbf{e}_{\text{spat}}\|^2}\right)$ and $SRR(\hat{\mathbf{s}};\mathbf{s}) = 10 \log_{10}\left(\dfrac{\|\tilde{\mathbf{s}}\|^2}{\|\mathbf{e}_{\text{resid}}\|^2}\right)$. The method supports framewise processing with a 2 s window and 50% overlap, and is demonstrated to be robust to common degradations such as codec compression and music-source-separation, validated on both synthetic and real multichannel bed–object scenes. An open-source Python implementation is provided, offering a practical, dataset-independent tool for quantifying spatial impairment in diverse multichannel configurations.
Abstract
Spatial audio quality is a highly multifaceted concept, with many interactions between environmental, geometrical, anatomical, psychological, and contextual considerations. Methods for characterization or evaluation of the geometrical components of spatial audio quality, however, remain scarce, despite being perhaps the least subjective aspect of spatial audio quality to quantify. By considering interchannel time and level differences relative to a reference signal, it is possible to construct a signal model to isolate some of the spatial distortion. By using a combination of least-square optimization and heuristics, we propose a signal decomposition method to isolate the spatial error from a processed signal, in terms of interchannel gain leakages and changes in relative delays. This allows the computation of simple energy-ratio metrics, providing objective measures of spatial and non-spatial signal qualities, with minimal assumptions and no dataset dependency. Experiments demonstrate the robustness of the method against common spatial signal degradation introduced by, e.g., audio compression and music source separation. Implementation is available at https://github.com/karnwatcharasupat/spauq.
