Table of Contents
Fetching ...

Direct and Residual Subspace Decomposition of Spatial Room Impulse Responses

Thomas Deppisch, Sebastià V. Amengual Garí, Paul Calamia, Jens Ahrens

TL;DR

This work proposes a subspace method that decomposes SRIRs into a direct part, which comprises the direct sound and the salient reflections, and a residual, to facilitate enhanced analysis and rendering methods by providing individual access to these components.

Abstract

Psychoacoustic experiments have shown that directional properties of the direct sound, salient reflections, and the late reverberation of an acoustic room response can have a distinct influence on the auditory perception of a given room. Spatial room impulse responses (SRIRs) capture those properties and thus are used for direction-dependent room acoustic analysis and virtual acoustic rendering. This work proposes a subspace method that decomposes SRIRs into a direct part, which comprises the direct sound and the salient reflections, and a residual, to facilitate enhanced analysis and rendering methods by providing individual access to these components. The proposed method is based on the generalized singular value decomposition and interprets the residual as noise that is to be separated from the other components of the reverberation. Large generalized singular values are attributed to the direct part, which is then obtained as a low-rank approximation of the SRIR. By advancing from the end of the SRIR toward the beginning while iteratively updating the residual estimate, the method adapts to spatio-temporal variations of the residual. The method is evaluated using a spatio-spectral error measure and simulated SRIRs of different rooms, microphone arrays, and ratios of direct sound to residual energy. The proposed method creates lower errors than existing approaches in all tested scenarios, including a scenario with two simultaneous reflections. A case study with measured SRIRs shows the applicability of the method under real-world acoustic conditions. A reference implementation is provided.

Direct and Residual Subspace Decomposition of Spatial Room Impulse Responses

TL;DR

This work proposes a subspace method that decomposes SRIRs into a direct part, which comprises the direct sound and the salient reflections, and a residual, to facilitate enhanced analysis and rendering methods by providing individual access to these components.

Abstract

Psychoacoustic experiments have shown that directional properties of the direct sound, salient reflections, and the late reverberation of an acoustic room response can have a distinct influence on the auditory perception of a given room. Spatial room impulse responses (SRIRs) capture those properties and thus are used for direction-dependent room acoustic analysis and virtual acoustic rendering. This work proposes a subspace method that decomposes SRIRs into a direct part, which comprises the direct sound and the salient reflections, and a residual, to facilitate enhanced analysis and rendering methods by providing individual access to these components. The proposed method is based on the generalized singular value decomposition and interprets the residual as noise that is to be separated from the other components of the reverberation. Large generalized singular values are attributed to the direct part, which is then obtained as a low-rank approximation of the SRIR. By advancing from the end of the SRIR toward the beginning while iteratively updating the residual estimate, the method adapts to spatio-temporal variations of the residual. The method is evaluated using a spatio-spectral error measure and simulated SRIRs of different rooms, microphone arrays, and ratios of direct sound to residual energy. The proposed method creates lower errors than existing approaches in all tested scenarios, including a scenario with two simultaneous reflections. A case study with measured SRIRs shows the applicability of the method under real-world acoustic conditions. A reference implementation is provided.
Paper Structure (18 sections, 17 equations, 10 figures, 1 algorithm)

This paper contains 18 sections, 17 equations, 10 figures, 1 algorithm.

Figures (10)

  • Figure 1: The subspace decomposition can be performed if the source covariance matrix $\bm R_{\mathrm s}$ is singular, i.e., its rank ${Q_{\mathrm s} < M}$. (a)The mean rank $\bar{Q}_{\mathrm s}$ due to a single impinging plane wave depends on the number of microphones $M$, on the array radius $r$, and on the array surface being rigid or open. It is singular if it stays below the dashed gray line illustrating the number of microphones. (b) The source covariance matrix of an SRIR was simulated using the image source method. It is singular in the early part. The summed magnitude of the SRIR is shown in gray for reference.
  • Figure 2: Direct and residual subspace decomposition of a 32-channel SRIR $\bm x(t)$. (a) The proposed algorithm first takes an initial residual estimate from the end of the SRIR. It then proceeds toward the beginning of the SRIR and performs the GSVD on every signal block. If the sum of the GSVs $\bm \sigma(t)$ is below the detection threshold, the residual estimate is updated. If their sum exceeds the threshold, the subspace decomposition is performed. (b) A zoomed-in part of the SRIR contains a salient reflection. (c) The eight largest GSVs of the zoomed-in part exhibit a distinct peak at the location of the reflection. The two smallest GSVs do not exhibit a visible peak. (d) The direct signal $\bm x_{\mathrm s}(t)$ contains the salient reflection from (b). (e) The residual signal $\bm x_{\mathrm n}(t)$ does not contain the reflection.
  • Figure 3: The weighted cumulative sums $\zeta(k,t)$ of the GSVs, shown in shades from orange to brown, exhibit distinct peaks at times where reflections occur. (a) The GSV sum, which is the largest of the cumulative sums $\zeta(k,t)$, exceeds the detection threshold, drawn as a solid black line, for the direct sound and each of the 6 reflections. Thus, all direct components are detected. (b) Zoomed-in section of (a) around a reflection. The left and right borders of the gray rectangle mark the time instances where a reflection is detected. The number of direct subspace components is determined as the number of weighted, cumulatively summed GSVs $\zeta(k,t)$ that exceed the time-averaged sum $\mu(t)$ of the GSVs, which is shown as a dotted black line. At the peak, this results in 6 direct components and 26 residual components.
  • Figure 4: Norms of the direct part $\bm x_{\mathrm s}(t)$ and the residual $\bm x_{\mathrm n}(t)$ of, (a), the ground truth, (b), the proposed method applied to unprocessed microphone signals and, (c), the proposed method applied to an SH decomposition of the array signals.
  • Figure 5: Norms of the ground truth spectra $\bm \chi_\mathrm{s}(f)$ of, (a), the first and, (b), the seventh reflection from Fig. \ref{['fig:spaceVsShDomain']} (a) and of extracted reflection spectra using the spatial subtraction method with two different signal models, SpatSub1 and SpatSub2, as well as the proposed subspace decomposition method SubDec.
  • ...and 5 more figures