Table of Contents
Fetching ...

QASTAnet: A DNN-based Quality Metric for Spatial Audio

Adrien Llave, Emma Granier, Grégory Pallone

TL;DR

This paper proposes QASTAnet (Quality Assessment for SpaTial Audio network), a new metric based on a deep neural network, specialized on spatial audio (ambisonics and binaural) and demonstrates that QASTAnet overcomes the aforementioned limitations of the existing methods.

Abstract

In the development of spatial audio technologies, reliable and shared methods for evaluating audio quality are essential. Listening tests are currently the standard but remain costly in terms of time and resources. Several models predicting subjective scores have been proposed, but they do not generalize well to real-world signals. In this paper, we propose QASTAnet (Quality Assessment for SpaTial Audio network), a new metric based on a deep neural network, specialized on spatial audio (ambisonics and binaural). As training data is scarce, we aim for the model to be trainable with a small amount of data. To do so, we propose to rely on expert modeling of the low-level auditory system and use a neurnal network to model the high-level cognitive function of the quality judgement. We compare its performance to two reference metrics on a wide range of content types (speech, music, ambiance, anechoic, reverberated) and focusing on codec artifacts. Results demonstrate that QASTAnet overcomes the aforementioned limitations of the existing methods. The strong correlation between the proposed metric prediction and subjective scores makes it a good candidate for comparing codecs in their development.

QASTAnet: A DNN-based Quality Metric for Spatial Audio

TL;DR

This paper proposes QASTAnet (Quality Assessment for SpaTial Audio network), a new metric based on a deep neural network, specialized on spatial audio (ambisonics and binaural) and demonstrates that QASTAnet overcomes the aforementioned limitations of the existing methods.

Abstract

In the development of spatial audio technologies, reliable and shared methods for evaluating audio quality are essential. Listening tests are currently the standard but remain costly in terms of time and resources. Several models predicting subjective scores have been proposed, but they do not generalize well to real-world signals. In this paper, we propose QASTAnet (Quality Assessment for SpaTial Audio network), a new metric based on a deep neural network, specialized on spatial audio (ambisonics and binaural). As training data is scarce, we aim for the model to be trainable with a small amount of data. To do so, we propose to rely on expert modeling of the low-level auditory system and use a neurnal network to model the high-level cognitive function of the quality judgement. We compare its performance to two reference metrics on a wide range of content types (speech, music, ambiance, anechoic, reverberated) and focusing on codec artifacts. Results demonstrate that QASTAnet overcomes the aforementioned limitations of the existing methods. The strong correlation between the proposed metric prediction and subjective scores makes it a good candidate for comparing codecs in their development.

Paper Structure

This paper contains 12 sections, 2 equations, 3 figures, 1 table.

Figures (3)

  • Figure 1: Schematic representation of the QASTAnet metric architecture.
  • Figure 2: Histograms of MUSHRA ratings for the training and test sets. The hidden reference ratings are excluded.
  • Figure 3: QASTAnet (left) and eMoBi-Q (right) predictions vs. MUSHRA rating for each combination of stimulus and CuT. The evaluation does not consider the hidden reference ratings. The orange and black markers distinguish the signals including spatial reverberation from those generated with an ideal plane wave encoding (anechoic), respectively.