Table of Contents
Fetching ...

Fully Reversing the Shoebox Image Source Method: From Impulse Responses to Room Parameters

Tom Sprunck, Antoine Deleforge, Yannick Privat, Cédric Foy

TL;DR

This work investigates the reversibility of the shoebox image source method (ISM) for room impulse responses. It introduces an open-source, three-stage algorithm that recovers 18 forward parameters from a discrete, low-passed multichannel RIR: source position, room dimensions, the 6-DOF room pose, and wall absorption coefficients. The method combines gridless image-source localization with a room-axes recovery and first-order image-source labeling to infer the complete geometry; extensive simulations show near-exact recovery for sizable spherical microphone arrays (e.g., 32 channels at 16 kHz), with errors decaying as array size and sampling rate increase. Compared to a Dokmanic EDM baseline, the proposed approach yields substantially more accurate geometry and enables reliable RIR extrapolation, demonstrating, for the first time to our knowledge, that this classical forward model is invertible over a wide range of configurations. Real-data applicability remains an avenue for future work, requiring extensions to account for angular/frequency dependencies and potential occlusions.

Abstract

We present an algorithm that fully reverses the shoebox image source method (ISM), a popular and widely used room impulse response (RIR) simulator for cuboid rooms introduced by Allen and Berkley in 1979. More precisely, given a discrete multichannel RIR generated by the shoebox ISM for a microphone array of known geometry, the algorithm reliably recovers the 18 input parameters. These are the 3D source position, the 3 dimensions of the room, the 6-degrees-of-freedom room translation and orientation, and an absorption coefficient for each of the 6 room boundaries. The approach builds on a recently proposed gridless image source localization technique combined with new procedures for room axes recovery and first-order-reflection identification. Extensive simulated experiments reveal that near-exact recovery of all parameters is achieved for a 32-element, 8.4-cm-wide spherical microphone array and a sampling rate of 16~kHz using fully randomized input parameters within rooms of size 2X2X2 to 10X10X5 meters. Estimation errors decay towards zero when increasing the array size and sampling rate. The method is also shown to strongly outperform a known baseline, and its ability to extrapolate RIRs at new positions is demonstrated. Crucially, the approach is strictly limited to low-passed discrete RIRs simulated using the vanilla shoebox ISM. Nonetheless, it represents to our knowledge the first algorithmic demonstration that this difficult inverse problem is in-principle fully solvable over a wide range of configurations.

Fully Reversing the Shoebox Image Source Method: From Impulse Responses to Room Parameters

TL;DR

This work investigates the reversibility of the shoebox image source method (ISM) for room impulse responses. It introduces an open-source, three-stage algorithm that recovers 18 forward parameters from a discrete, low-passed multichannel RIR: source position, room dimensions, the 6-DOF room pose, and wall absorption coefficients. The method combines gridless image-source localization with a room-axes recovery and first-order image-source labeling to infer the complete geometry; extensive simulations show near-exact recovery for sizable spherical microphone arrays (e.g., 32 channels at 16 kHz), with errors decaying as array size and sampling rate increase. Compared to a Dokmanic EDM baseline, the proposed approach yields substantially more accurate geometry and enables reliable RIR extrapolation, demonstrating, for the first time to our knowledge, that this classical forward model is invertible over a wide range of configurations. Real-data applicability remains an avenue for future work, requiring extensions to account for angular/frequency dependencies and potential occlusions.

Abstract

We present an algorithm that fully reverses the shoebox image source method (ISM), a popular and widely used room impulse response (RIR) simulator for cuboid rooms introduced by Allen and Berkley in 1979. More precisely, given a discrete multichannel RIR generated by the shoebox ISM for a microphone array of known geometry, the algorithm reliably recovers the 18 input parameters. These are the 3D source position, the 3 dimensions of the room, the 6-degrees-of-freedom room translation and orientation, and an absorption coefficient for each of the 6 room boundaries. The approach builds on a recently proposed gridless image source localization technique combined with new procedures for room axes recovery and first-order-reflection identification. Extensive simulated experiments reveal that near-exact recovery of all parameters is achieved for a 32-element, 8.4-cm-wide spherical microphone array and a sampling rate of 16~kHz using fully randomized input parameters within rooms of size 2X2X2 to 10X10X5 meters. Estimation errors decay towards zero when increasing the array size and sampling rate. The method is also shown to strongly outperform a known baseline, and its ability to extrapolate RIRs at new positions is demonstrated. Crucially, the approach is strictly limited to low-passed discrete RIRs simulated using the vanilla shoebox ISM. Nonetheless, it represents to our knowledge the first algorithmic demonstration that this difficult inverse problem is in-principle fully solvable over a wide range of configurations.
Paper Structure (19 sections, 1 theorem, 21 equations, 8 figures, 1 table, 2 algorithms)

This paper contains 19 sections, 1 theorem, 21 equations, 8 figures, 1 table, 2 algorithms.

Key Result

Proposition 1

Let $N_1, N_2, N_3$ be non-zero even integers. Consider the following subset of image sources: $\mathcal{G}=\{\boldsymbol{r}_{\boldsymbol q,\boldsymbol \varepsilon}, \; \boldsymbol q\in[\![0 ,N_1/2-1]\!]\times [\![0 ,N_2/2-1]\!] \times [\![0 ,N_3/2-1]\!] ,\; \varepsilon_i \in \{-1,1\}\}$ with $\bol

Figures (8)

  • Figure 1: (a) Reconstructed image-source point cloud using sprunck2022gridless (b) Associated $J_3^\sigma$ score plotted on the sphere (brighter is higher). A sharp peak is observed in the direction of a wall normal.
  • Figure 2: Projection of the estimated sources on $\hat{\boldsymbol e}_1$ (blue) and the associated 2D $J_{2, \hat{\boldsymbol e}_1}^\sigma$ score (red). We observe maximal values in the directions of the wall normals.
  • Figure 3: Geometries of the microphone arrays used in the experiments. (a) Smallest spherical array used in Section \ref{['subsec:results']}, (b) non-spherical array used for the baseline comparison in Section \ref{['subsec:comparison']}
  • Figure 4: Mean absolute errors on room dimensions (a), mean Euclidean errors on room center (b) and mean angular error on room orientation (c) in function of the sampling frequency for varying array radii and frequency of sampling.
  • Figure 5: Mean absolute error on absorption coefficients recovered below a $0.3$ threshold for varying array radii and frequency of sampling. The recall for this threshold is indicated above each bar, in percent.
  • ...and 3 more figures

Theorems & Definitions (2)

  • Proposition 1
  • proof