SIRUP: A diffusion-based virtual upmixer of steering vectors for highly-directive spatialization with first-order ambisonics

Emilio Picard; Diego Di Carlo; Aditya Arie Nugraha; Mathieu Fontaine; Kazuyoshi Yoshii

SIRUP: A diffusion-based virtual upmixer of steering vectors for highly-directive spatialization with first-order ambisonics

Emilio Picard, Diego Di Carlo, Aditya Arie Nugraha, Mathieu Fontaine, Kazuyoshi Yoshii

TL;DR

This paper presents virtual upmixing of steering vectors captured by a fewer-channel spherical microphone array using a latent diffusion model architecture that achieved a significant improvement compared to FOA systems for steering vector upmixing, source localization, and speech denoising.

Abstract

This paper presents virtual upmixing of steering vectors captured by a fewer-channel spherical microphone array. This challenge has conventionally been addressed by recovering the directions and signals of sound sources from first-order ambisonics (FOA) data, and then rendering the higher-order ambisonics (HOA) data using a physics-based acoustic simulator. This approach, however, struggles to handle the mutual dependency between the spatial directivity of source estimation and the spatial resolution of FOA ambisonics data. Our method, named SIRUP, employs a latent diffusion model architecture. Specifically, a variational autoencoder (VAE) is used to learn a compact encoding of the HOA data in a latent space and a diffusion model is then trained to generate the HOA embeddings, conditioned by the FOA data. Experimental results showed that SIRUP achieved a significant improvement compared to FOA systems for steering vector upmixing, source localization, and speech denoising.

SIRUP: A diffusion-based virtual upmixer of steering vectors for highly-directive spatialization with first-order ambisonics

TL;DR

Abstract

Paper Structure (15 sections, 7 equations, 3 figures, 2 tables)

This paper contains 15 sections, 7 equations, 3 figures, 2 tables.

Introduction
Background
Steering vectors
Applications
Latent diffusion models for image out-painting
Proposed Method
Steering vector upmixing
Downstream tasks with up-mixed steering vectors
Evaluation
Experimental settings
Experimental data
Model configuration
Evaluation metrics
Experimental results
Conclusion

Figures (3)

Figure 1: The SIRUP upmixer for downstream tasks.
Figure 2: Average angular errors across conditions using different localization methods and SV models, using SRP (Eq. (\ref{['eq:beamforming']})).
Figure 3: 2D Heatmap comparaison of estimated SV for FOA and ground truth HOA setup.

SIRUP: A diffusion-based virtual upmixer of steering vectors for highly-directive spatialization with first-order ambisonics

TL;DR

Abstract

SIRUP: A diffusion-based virtual upmixer of steering vectors for highly-directive spatialization with first-order ambisonics

Authors

TL;DR

Abstract

Table of Contents

Figures (3)