Interpretable Binaural Deep Beamforming Guided by Time-Varying Relative Transfer Function

Ilai Zaidel; Sharon Gannot

Interpretable Binaural Deep Beamforming Guided by Time-Varying Relative Transfer Function

Ilai Zaidel, Sharon Gannot

TL;DR

Results show that RTF guidance yields smoother, more spatially consistent beampatterns that track the target direction of arrival (DOA), whereas the unguided model fails to maintain a clear spatial focus.

Abstract

In this work, we propose a deep beamforming framework for speech enhancement in dynamic acoustic environments. The framework learns time-varying beamformer weights from noisy multichannel signals via a deep neural network, guided by a continuously tracked relative transfer function (RTF) of a moving target speaker. We analyze the network's spatial behavior on an 8-microphone linear array by evaluating narrowband and wideband beampatterns in three modes: (i) oracle guidance with true RTFs, (ii) guidance with subspace-tracked RTF estimates, and (iii) operation without RTF guidance. Results show that RTF guidance yields smoother, more spatially consistent beampatterns that track the target direction of arrival (DOA), whereas the unguided model fails to maintain a clear spatial focus. We further extend the framework to binaural beamforming for dynamic target-speaker enhancement. The system is trained using a head-related transfer function (HRTF)-based acoustic simulation of a moving source, enabling realistic spatial rendering at the left and right ears. Spatial cue preservation is quantitatively evaluated in terms of interaural level differences (ILD) and interaural time differences (ITD), demonstrating the method's suitability for hearable applications.

Interpretable Binaural Deep Beamforming Guided by Time-Varying Relative Transfer Function

TL;DR

Abstract

Interpretable Binaural Deep Beamforming Guided by Time-Varying Relative Transfer Function

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (5)