Table of Contents
Fetching ...

Velocity Potential Neural Field for Efficient Ambisonics Impulse Response Modeling

Yoshiki Masuyama, Francois G. Germain, Gordon Wichern, Chiori Hori, Jonathan Le Roux

Abstract

First-order Ambisonics (FOA) is a standard spatial audio format based on spherical harmonic decomposition. Its zeroth- and first-order components capture the sound pressure and particle velocity, respectively. Recently, physics-informed neural networks have been applied to the spatial interpolation of FOA signals, regularizing the network outputs based on soft penalty terms derived from physical principles, e.g., the linearized momentum equation. In this paper, we reformulate the task so that the predicted FOA signal automatically satisfies the linearized momentum equation. Our network approximates a scalar function called velocity potential, rather than the FOA signal itself. Then, the FOA signal can be readily recovered through the partial derivatives of the velocity potential with respect to the network inputs (i.e., time and microphone position) according to physics of sound propagation. By deriving the four channels of FOA from the single-channel velocity potential, the reconstructed signal follows the physical principle at any time and position by construction. Experimental results on room impulse response reconstruction confirm the effectiveness of the proposed framework.

Velocity Potential Neural Field for Efficient Ambisonics Impulse Response Modeling

Abstract

First-order Ambisonics (FOA) is a standard spatial audio format based on spherical harmonic decomposition. Its zeroth- and first-order components capture the sound pressure and particle velocity, respectively. Recently, physics-informed neural networks have been applied to the spatial interpolation of FOA signals, regularizing the network outputs based on soft penalty terms derived from physical principles, e.g., the linearized momentum equation. In this paper, we reformulate the task so that the predicted FOA signal automatically satisfies the linearized momentum equation. Our network approximates a scalar function called velocity potential, rather than the FOA signal itself. Then, the FOA signal can be readily recovered through the partial derivatives of the velocity potential with respect to the network inputs (i.e., time and microphone position) according to physics of sound propagation. By deriving the four channels of FOA from the single-channel velocity potential, the reconstructed signal follows the physical principle at any time and position by construction. Experimental results on room impulse response reconstruction confirm the effectiveness of the proposed framework.
Paper Structure (11 sections, 19 equations, 4 figures, 1 table)

This paper contains 11 sections, 19 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Overview of the proposed VPNF, where bold solid arrows indicate automatic differentiation. We predict the zeroth- and first-order components of Ambisonics by computing the partial derivatives of the potential with respect to time and position, respectively.
  • Figure 2: Overhead view of the room geometry for simulation.
  • Figure 3: NMSE [dB] averaged over ten rooms with different numbers of measurements.
  • Figure 4: Average Pearson’s correlation coefficients for the $W$- and $(X, Y, Z)$-channels.