Table of Contents
Fetching ...

Differentiable Grouped Feedback Delay Networks for Learning Coupled Volume Acoustics

Orchisama Das, Gloria Dal Santo, Sebastian J. Schlecht, Vesa Valimaki, Zoran Cvetkovic

TL;DR

This work proposes the concept of Differentiable GFDNs (DiffGFDNs), which have tunable parameters that are optimised to match the late reverberation profile of a set of RIRs captured from a space that exhibits multi-slope decay.

Abstract

Rendering dynamic reverberation in a complicated acoustic space for moving sources and listeners is challenging but crucial for enhancing user immersion in extended-reality (XR) applications. Capturing spatially varying room impulse responses (RIRs) is costly and often impractical. Moreover, dynamic convolution with measured RIRs is computationally expensive with high memory demands, typically not available on wearable computing devices. Grouped Feedback Delay Networks (GFDNs), on the other hand, allow efficient rendering of coupled room acoustics. However, its parameters need to be tuned to match the reverberation profile of a coupled space. In this work, we propose the concept of Differentiable GFDNs (DiffGFDNs), which have tunable parameters that are optimised to match the late reverberation profile of a set of RIRs captured from a space that exhibits multi-slope decay. Once trained on a finite set of measurements, the DiffGFDN interpolates to unmeasured locations in the space. We propose a parallel processing pipeline that has multiple DiffGFDNs with frequency-independent parameters processing each octave band. The parameters of the DiffGFDN can be updated rapidly during inferencing as sources and listeners move. We evaluate the proposed architecture against the Common Slopes (CS) model on a dataset of RIRs for three coupled rooms. The proposed architecture generates multi-slope late reverberation with low memory and computational requirements, achieving a better energy decay relief (EDR) error and slightly worse octave-band energy decay curve (EDC) errors compared to the CS model. Furthermore, DiffGFDN requires an order of magnitude fewer floating-point operations per sample than the CS renderer.

Differentiable Grouped Feedback Delay Networks for Learning Coupled Volume Acoustics

TL;DR

This work proposes the concept of Differentiable GFDNs (DiffGFDNs), which have tunable parameters that are optimised to match the late reverberation profile of a set of RIRs captured from a space that exhibits multi-slope decay.

Abstract

Rendering dynamic reverberation in a complicated acoustic space for moving sources and listeners is challenging but crucial for enhancing user immersion in extended-reality (XR) applications. Capturing spatially varying room impulse responses (RIRs) is costly and often impractical. Moreover, dynamic convolution with measured RIRs is computationally expensive with high memory demands, typically not available on wearable computing devices. Grouped Feedback Delay Networks (GFDNs), on the other hand, allow efficient rendering of coupled room acoustics. However, its parameters need to be tuned to match the reverberation profile of a coupled space. In this work, we propose the concept of Differentiable GFDNs (DiffGFDNs), which have tunable parameters that are optimised to match the late reverberation profile of a set of RIRs captured from a space that exhibits multi-slope decay. Once trained on a finite set of measurements, the DiffGFDN interpolates to unmeasured locations in the space. We propose a parallel processing pipeline that has multiple DiffGFDNs with frequency-independent parameters processing each octave band. The parameters of the DiffGFDN can be updated rapidly during inferencing as sources and listeners move. We evaluate the proposed architecture against the Common Slopes (CS) model on a dataset of RIRs for three coupled rooms. The proposed architecture generates multi-slope late reverberation with low memory and computational requirements, achieving a better energy decay relief (EDR) error and slightly worse octave-band energy decay curve (EDC) errors compared to the CS model. Furthermore, DiffGFDN requires an order of magnitude fewer floating-point operations per sample than the CS renderer.

Paper Structure

This paper contains 26 sections, 44 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Differentiable Grouped FDN architecture for $G = 3$ groups. The bold lines represent multichannel signals, and the different colours represent different groups, $\mathbf{s}_p$ and $\mathbf{r}_q$ represent source and receiver positions, respectively, and $H_{pq}(z)$ and $\hat{H}_{pq}(z)$ are the reference and predicted transfer functions, respectively, for the position pair ($\mathbf{s}_p, \mathbf{r}_q$).
  • Figure 2: Training (left) and inferencing (right) pipelines for subband processing of the input signal with a parallel network of DiffGFDNs.
  • Figure 3: The left plot shows the magnitude response of one group in the GFDN before training, and the right plot shows the magnitude response of the same group post-training. Before training, the GFDN has random input-output gains and a random unitary feedback matrix; the training tunes these parameters for minimal colouration. A flat magnitude response is desirable. The different colours show the magnitude response in different subbands, the magnitude response summed over all subbands is offset by $20$ dB and shown in yellowish green dotted lines.
  • Figure 4: Comparison of NEDs at position (9.3, 6.6, 1.5) m before (blue) and after training (orange) for different subband GFDNs. A faster rise in NED is desired for smoother reverberation.
  • Figure 5: Mean EDC fit error between the original and CS synthesised RIRs at all receiver positions in the Treble dataset. An error of $0$ dB indicates a perfect match. The source position is marked with a red cross. The CS model required an RIR measurement at all positions.
  • ...and 3 more figures