Table of Contents
Fetching ...

Performance and Robustness of Signal-Dependent vs. Signal-Independent Binaural Signal Matching with Wearable Microphone Arrays

Ami Berger, Vladimir Tourbabin, Jacob Donley, Zamir Ben-Hur, Boaz Rafaely

TL;DR

The paper addresses the limitation of BSM under diffuse-field assumptions for wearable arrays in directional, high-DRR environments. It introduces two signal-aware BSM variants—COMPASS-BSM (COM) and d-BSM—that integrate direct-source information either through parametric direct-reverberant decomposition or through an informed covariance approach, deriving corresponding filters such that $\mathbf{c}^{l,r}_{COM}$ and $\mathbf{c}^{l,r}_{d-BSM}$ reflect the direct component. The study shows substantial improvements in binaural cues (notably at the source direction) with only minor degradation off-axis, and demonstrates robustness to DOA estimation errors where performance tends to converge toward standard BSM when errors are large. Objective metrics (NMSE, ITD, ILD) and a listening test corroborate the gains, highlighting practical benefits for wearable binaural rendering. These results provide actionable guidance on when to adopt signal-dependent BSM in wearable audio systems, balancing modeling detail against robustness in real-world conditions.

Abstract

The increasing popularity of spatial audio in applications such as teleconferencing, entertainment, and virtual reality has led to the recent developments of binaural reproduction methods. However, only a few of these methods are well-suited for wearable and mobile arrays, which typically consist of a small number of microphones. One such method is binaural signal matching (BSM), which has been shown to produce high-quality binaural signals for wearable arrays. However, BSM may be suboptimal in cases of high direct-to-reverberant ratio (DRR) as it is based on the diffuse sound field assumption. To overcome this limitation, previous studies incorporated sound-field models other than diffuse. However, performance may be sensitive to signal estimation errors. This paper aims to provide a systematic and comprehensive analysis of signal-dependent vs. signal-independent BSM, so that the benefits and limitations of the methods become clearer. Two signal-dependent BSM-based methods designed for high DRR scenarios that incorporate a sound field model composed of direct and reverberant components are investigated mathematically, using simulations, and finally validated by a listening test, and compared to the signal-independent BSM. The results show that signal-dependent BSM can significantly improve performance, in particular in the direction of the source, while presenting only a negligible degradation in other directions. Furthermore, when source direction estimation is inaccurate, performance of of the signal-dependent BSM degrade to equal that of the signal-independent BSM, presenting a desired robustness quality.

Performance and Robustness of Signal-Dependent vs. Signal-Independent Binaural Signal Matching with Wearable Microphone Arrays

TL;DR

The paper addresses the limitation of BSM under diffuse-field assumptions for wearable arrays in directional, high-DRR environments. It introduces two signal-aware BSM variants—COMPASS-BSM (COM) and d-BSM—that integrate direct-source information either through parametric direct-reverberant decomposition or through an informed covariance approach, deriving corresponding filters such that and reflect the direct component. The study shows substantial improvements in binaural cues (notably at the source direction) with only minor degradation off-axis, and demonstrates robustness to DOA estimation errors where performance tends to converge toward standard BSM when errors are large. Objective metrics (NMSE, ITD, ILD) and a listening test corroborate the gains, highlighting practical benefits for wearable binaural rendering. These results provide actionable guidance on when to adopt signal-dependent BSM in wearable audio systems, balancing modeling detail against robustness in real-world conditions.

Abstract

The increasing popularity of spatial audio in applications such as teleconferencing, entertainment, and virtual reality has led to the recent developments of binaural reproduction methods. However, only a few of these methods are well-suited for wearable and mobile arrays, which typically consist of a small number of microphones. One such method is binaural signal matching (BSM), which has been shown to produce high-quality binaural signals for wearable arrays. However, BSM may be suboptimal in cases of high direct-to-reverberant ratio (DRR) as it is based on the diffuse sound field assumption. To overcome this limitation, previous studies incorporated sound-field models other than diffuse. However, performance may be sensitive to signal estimation errors. This paper aims to provide a systematic and comprehensive analysis of signal-dependent vs. signal-independent BSM, so that the benefits and limitations of the methods become clearer. Two signal-dependent BSM-based methods designed for high DRR scenarios that incorporate a sound field model composed of direct and reverberant components are investigated mathematically, using simulations, and finally validated by a listening test, and compared to the signal-independent BSM. The results show that signal-dependent BSM can significantly improve performance, in particular in the direction of the source, while presenting only a negligible degradation in other directions. Furthermore, when source direction estimation is inaccurate, performance of of the signal-dependent BSM degrade to equal that of the signal-independent BSM, presenting a desired robustness quality.
Paper Structure (23 sections, 31 equations, 8 figures, 1 table)

This paper contains 23 sections, 31 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: An illustration of a virtual listener head and the semi-circular array assuming a: \ref{['fig:subfig1illus']}$0^\circ$ array rotation. \ref{['fig:subfig3illus']}$50^\circ$ array rotation. \ref{['fig:subfig2illus']}$90^\circ$ array rotation. The blue dots represent the array microphones, and the $x$ and $y$ grey arrows represent the positive $x$ and $y$ axes, respectively.
  • Figure 2: The NMSE of the BSM method is presented in the top and bottom figures for the right and left ears, respectively. The NMSE values are computed as defined in Section \ref{['sec:NMSE']}, with reference to a diffuse sound field as detailed in ref21. The evaluations consider an $SNR$ of $20\,$dB and three head-rotation of $90^\circ$, $50^\circ$ and no head rotation.
  • Figure 3: The ITD (a) and ILD (b) evaluated using the direct sound component of scenario 1 in Table \ref{['en:sims']}. The orange line represents the BSM method, the purple line corresponds to the d-BSM approach, and the green line depicts the COM approach. ITD and ILD errors are also presented, as defined in Section \ref{['sec:ITD&ILD']}. No head rotation was employed.
  • Figure 4: The ITD on the left and ILD on the right evaluated using the direct sound component of the three different scenarios described in Table \ref{['en:sims']}. The orange line represents the BSM method, the purple line corresponds to the d-BSM approach, and the green line depicts the COM approach. ITD and ILD errors are also presented, as defined in Section \ref{['sec:ITD&ILD']}. A $50^\circ$ head rotation was employed.
  • Figure 5: The ITD (a) and ILD (b) evaluated using the direct sound component of scenario 1 in Table \ref{['en:sims']}. The orange line represents the BSM method, the purple line corresponds to the d-BSM approach, and the green line depicts the COM approach. ITD and ILD errors are also presented, as defined in Section \ref{['sec:ITD&ILD']}. An error of ${\Omega_{err}}=(\phi=10^\circ,\theta=0^\circ)$ and a $50^\circ$ head rotation were employed.
  • ...and 3 more figures