Neural Ambisonics encoding for compact irregular microphone arrays

Mikko Heikkinen; Archontis Politis; Tuomas Virtanen

Neural Ambisonics encoding for compact irregular microphone arrays

Mikko Heikkinen, Archontis Politis, Tuomas Virtanen

TL;DR

This paper tackles Ambisonics encoding for irregular microphone arrays by introducing a learning-based encoder that estimates a complex encoding matrix $\mathbf{M}(t,f)$ from microphone STFT inputs $\mathbf{x}(t,f)$ to produce Ambisonic signals via $\bar{\mathbf{b}}(t,f)=\mathbf{M}(t,f)\mathbf{x}(t,f)$. The method combines a frequency-specific preprocessing layer with a U-Net, trained under a composite loss that emphasizes MAE, energy preservation, and coherence to balance spectral accuracy and spatial fidelity. Evaluation on simulated reverberant scenes with irregular and regular four-microphone arrays shows the DNN encoder can meet or exceed a conventional time-invariant LS encoder, offering better performance below the spatial aliasing frequency and improved coherence above aliasing, especially for irregular geometries. This approach enables more flexible, device-independent spatial audio capture for XR and related applications where microphone geometry is nonuniform, with potential extensions to account for directivity and enclosure effects.

Abstract

Ambisonics encoding of microphone array signals can enable various spatial audio applications, such as virtual reality or telepresence, but it is typically designed for uniformly-spaced spherical microphone arrays. This paper proposes a method for Ambisonics encoding that uses a deep neural network (DNN) to estimate a signal transform from microphone inputs to Ambisonics signals. The approach uses a DNN consisting of a U-Net structure with a learnable preprocessing as well as a loss function consisting of mean average error, spatial correlation, and energy preservation components. The method is validated on two microphone arrays with regular and irregular shapes having four microphones, on simulated reverberant scenes with multiple sources. The results of the validation show that the proposed method can meet or exceed the performance of a conventional signal-independent Ambisonics encoder on a number of error metrics.

Neural Ambisonics encoding for compact irregular microphone arrays

TL;DR

This paper tackles Ambisonics encoding for irregular microphone arrays by introducing a learning-based encoder that estimates a complex encoding matrix

from microphone STFT inputs

to produce Ambisonic signals via

. The method combines a frequency-specific preprocessing layer with a U-Net, trained under a composite loss that emphasizes MAE, energy preservation, and coherence to balance spectral accuracy and spatial fidelity. Evaluation on simulated reverberant scenes with irregular and regular four-microphone arrays shows the DNN encoder can meet or exceed a conventional time-invariant LS encoder, offering better performance below the spatial aliasing frequency and improved coherence above aliasing, especially for irregular geometries. This approach enables more flexible, device-independent spatial audio capture for XR and related applications where microphone geometry is nonuniform, with potential extensions to account for directivity and enclosure effects.

Abstract

Paper Structure (11 sections, 9 equations, 2 figures)

This paper contains 11 sections, 9 equations, 2 figures.

Introduction
Proposed Methods
Problem Setup
DNN Method
Evaluation
Data
Model parameters
Baseline
Metrics
Results
Conclusions

Figures (2)

Figure 1: Block diagram of the proposed method.
Figure 2: On the left: evaluation metrics averaged over all channels. On the right: mean magnitude errors for individual channels.

Neural Ambisonics encoding for compact irregular microphone arrays

TL;DR

Abstract

Neural Ambisonics encoding for compact irregular microphone arrays

Authors

TL;DR

Abstract

Table of Contents

Figures (2)