Ambisonics Networks -- The Effect Of Radial Functions Regularization

Bar Shaybet; Anurag Kumar; Vladimir Tourbabin; Boaz Rafaely

Ambisonics Networks -- The Effect Of Radial Functions Regularization

Bar Shaybet, Anurag Kumar, Vladimir Tourbabin, Boaz Rafaely

TL;DR

Investigating the impact of different ways of regularization on Deep Neural Network (DNN) training and performance shows that performance may be sensitive to the way of regularization, and an informed approach is proposed and investigated.

Abstract

Ambisonics, a popular format of spatial audio, is the spherical harmonic (SH) representation of the plane wave density function of a sound field. Many algorithms operate in the SH domain and utilize the Ambisonics as their input signal. The process of encoding Ambisonics from a spherical microphone array involves dividing by the radial functions, which may amplify noise at low frequencies. This can be overcome by regularization, with the downside of introducing errors to the Ambisonics encoding. This paper aims to investigate the impact of different ways of regularization on Deep Neural Network (DNN) training and performance. Ideally, these networks should be robust to the way of regularization. Simulated data of a single speaker in a room and experimental data from the LOCATA challenge were used to evaluate this robustness on an example algorithm of speaker localization based on the direct-path dominance (DPD) test. Results show that performance may be sensitive to the way of regularization, and an informed approach is proposed and investigated, highlighting the importance of regularization information.

Ambisonics Networks -- The Effect Of Radial Functions Regularization

TL;DR

Abstract

Paper Structure (12 sections, 9 equations, 5 figures)

This paper contains 12 sections, 9 equations, 5 figures.

Introduction
Ambisonics Encoding
Regularized PWD
The effect of regularization
Speaker localization using the DNN-DPD algorithm
Experimental Investigation
Setup
Simulated data
Measured data
Evaluation Methodology
Results
Conclusions

Figures (5)

Figure 1: $(a)$ noise gain Gnoise, and $(b)$ distortion DIST as a function of frequency. In this example $\mathbf{a}$ represents a unit amplitude plane wave and $r=4.2$ cm.
Figure 2: DNN-DPD based DOA estimation algorithm, showing the computation of the features, the details of the DNN-DPD block, followed by MUSIC. Network output, $(f_1, f_2),$ indicates the probability of the input to contain direct signal. $\hat{\theta}, \hat{\phi}$ are the estimated DOA of the speaker.
Figure 3: DOA error as function of the top-selected bins, simulated test data. Bins were selected according to the DNN-DPD network output $f_1$ in Fig. \ref{['fig:net']}
Figure 4: DOA error as function of frequency band for the test data. 5% of top-scoring bins were selected from each frequency band. The markers represent the main frequency of each frequency band.
Figure 5: DOA estimation error as a function of selected top bins, for simulated and LOCATA data, with the informed and uninformed algorithms.

Ambisonics Networks -- The Effect Of Radial Functions Regularization

TL;DR

Abstract

Ambisonics Networks -- The Effect Of Radial Functions Regularization

Authors

TL;DR

Abstract

Table of Contents

Figures (5)