Table of Contents
Fetching ...

Ambisonics Networks -- The Effect Of Radial Functions Regularization

Bar Shaybet, Anurag Kumar, Vladimir Tourbabin, Boaz Rafaely

TL;DR

Investigating the impact of different ways of regularization on Deep Neural Network (DNN) training and performance shows that performance may be sensitive to the way of regularization, and an informed approach is proposed and investigated.

Abstract

Ambisonics, a popular format of spatial audio, is the spherical harmonic (SH) representation of the plane wave density function of a sound field. Many algorithms operate in the SH domain and utilize the Ambisonics as their input signal. The process of encoding Ambisonics from a spherical microphone array involves dividing by the radial functions, which may amplify noise at low frequencies. This can be overcome by regularization, with the downside of introducing errors to the Ambisonics encoding. This paper aims to investigate the impact of different ways of regularization on Deep Neural Network (DNN) training and performance. Ideally, these networks should be robust to the way of regularization. Simulated data of a single speaker in a room and experimental data from the LOCATA challenge were used to evaluate this robustness on an example algorithm of speaker localization based on the direct-path dominance (DPD) test. Results show that performance may be sensitive to the way of regularization, and an informed approach is proposed and investigated, highlighting the importance of regularization information.

Ambisonics Networks -- The Effect Of Radial Functions Regularization

TL;DR

Investigating the impact of different ways of regularization on Deep Neural Network (DNN) training and performance shows that performance may be sensitive to the way of regularization, and an informed approach is proposed and investigated.

Abstract

Ambisonics, a popular format of spatial audio, is the spherical harmonic (SH) representation of the plane wave density function of a sound field. Many algorithms operate in the SH domain and utilize the Ambisonics as their input signal. The process of encoding Ambisonics from a spherical microphone array involves dividing by the radial functions, which may amplify noise at low frequencies. This can be overcome by regularization, with the downside of introducing errors to the Ambisonics encoding. This paper aims to investigate the impact of different ways of regularization on Deep Neural Network (DNN) training and performance. Ideally, these networks should be robust to the way of regularization. Simulated data of a single speaker in a room and experimental data from the LOCATA challenge were used to evaluate this robustness on an example algorithm of speaker localization based on the direct-path dominance (DPD) test. Results show that performance may be sensitive to the way of regularization, and an informed approach is proposed and investigated, highlighting the importance of regularization information.
Paper Structure (12 sections, 9 equations, 5 figures)

This paper contains 12 sections, 9 equations, 5 figures.

Figures (5)

  • Figure 1: $(a)$ noise gain Gnoise, and $(b)$ distortion DIST as a function of frequency. In this example $\mathbf{a}$ represents a unit amplitude plane wave and $r=4.2$ cm.
  • Figure 2: DNN-DPD based DOA estimation algorithm, showing the computation of the features, the details of the DNN-DPD block, followed by MUSIC. Network output, $(f_1, f_2),$ indicates the probability of the input to contain direct signal. $\hat{\theta}, \hat{\phi}$ are the estimated DOA of the speaker.
  • Figure 3: DOA error as function of the top-selected bins, simulated test data. Bins were selected according to the DNN-DPD network output $f_1$ in Fig. \ref{['fig:net']}
  • Figure 4: DOA error as function of frequency band for the test data. 5% of top-scoring bins were selected from each frequency band. The markers represent the main frequency of each frequency band.
  • Figure 5: DOA estimation error as a function of selected top bins, for simulated and LOCATA data, with the informed and uninformed algorithms.