Universal Spatial Audio Transcoder

Amaia Sagasti; Davide Scaini; Daniel Arteaga

Universal Spatial Audio Transcoder

Amaia Sagasti, Davide Scaini, Daniel Arteaga

TL;DR

Abstract

This paper addresses the challenges associated with both the conversion between different spatial audio formats and the decoding of a spatial audio format to a specific loudspeaker layout. Existing approaches often rely on layout remapping tools, which may not guarantee optimal conversion from a psychoacoustic perspective. To overcome these challenges, we present the Universal Spatial Audio Transcoder (USAT) method and its corresponding open source implementation. USAT generates an optimal decoder or transcoder for any input spatial audio format, adapting it to any output format or 2D/3D loudspeaker configuration. Drawing upon optimization techniques based on psychoacoustic principles, the algorithm maximizes the preservation of spatial information. We present examples of the decoding and transcoding of several audio formats, and show that USAT approach is advantageous compared to the most common methods in the field.

Universal Spatial Audio Transcoder

TL;DR

Abstract

Paper Structure (16 sections, 20 equations, 8 figures, 2 tables)

This paper contains 16 sections, 20 equations, 8 figures, 2 tables.

Introduction
Algorithm description
Overview of the algorithm
Encoding, transcoding and decoding matrices
Cost function
Linear decoding terms (coherence)
Quadratic decoding terms (incoherence)
Other cost function terms
Total cost function
Cost function minimization
Example applications
5th order Ambisonics decoding to 7.0.4
7.0.4 transcoding to 5th order Ambisonics
5.0.2 decoding to irregular 3.0.1
Audio object decoding to 5.0
...and 1 more sections

Figures (8)

Figure 1: Overview of the optimization process in USAT. Dimensions $M$ and $N$ indicate the number of input and output channels, respectively; $L$, the number of sampled directions, and $P$ the number of loudspeakers in the real or virtual layout.
Figure 2: 5OA decoding to 7.0.4. Loudspeaker gains corresponding to a virtual sound source encoded in 5OA on the horizontal plane at the indicated azimuth. Results with USAT (top) and AllRad (bottom). Only loudspeakers on the horizontal plane shown.
Figure 3: 5OA decoding to 7.0.4. Box plots indicating the values of the energy in dB ($E$), apparent source width (ASW) and angular error ($\delta$) for the decoding of 5th order Ambisonics to 7.0.4, with USAT (orange) and AllRad (green) methods compared, and ideal values indicated in blue. The boxplots depict the median values, interquartile range, and maximum range (excluding outliers) for a set of directions sampling the upper hemisphere.
Figure 4: 7.0.4 transcoding to 5OA. Box plots indicating the values of the pressure, ASW and angular error on a set of points sampling the upper hemisphere. USAT (orange) is compared to a direct encoding of each one of the loudspeakers feeds into 5OA (green).
Figure 5: 5.0.2 decoding to irregular 3.0.1. Box plots indicating the values of the energy, ASW and angular error, for USAT (orange) and channel remapping with VBAP (green).
...and 3 more figures

Universal Spatial Audio Transcoder

TL;DR

Abstract

Universal Spatial Audio Transcoder

Authors

TL;DR

Abstract

Table of Contents

Figures (8)