Online neural fusion of distortionless differential beamformers for robust speech enhancement
Yuanhang Qian, Kunlong Zhao, Jilu Jin, Xueqin Luo, Gongping Huang, Jingdong Chen, Jacob Benesty
TL;DR
This work tackles the challenge of robust speech enhancement under dynamic acoustic conditions by fusing outputs from multiple fixed distortionless beamformers using an online neural network. The proposed BeamFusion framework learns frame-level fusion weights with a softmax constraint to maintain distortionless response, enabling real-time adaptation to moving interference while preserving target speech. Empirical results show BeamFusion outperforming individual beamformers and adaptive convex combination (ACC) across reverberation levels and interference scenarios, with improved SNR, SI-SDR, and SIR metrics. The approach promises practical impact for real-time, high-fidelity speech enhancement in non-stationary environments, without requiring explicit noise statistics estimation.
Abstract
Fixed beamforming is widely used in practice since it does not depend on the estimation of noise statistics and provides relatively stable performance. However, a single beamformer cannot adapt to varying acoustic conditions, which limits its interference suppression capability. To address this, adaptive convex combination (ACC) algorithms have been introduced, where the outputs of multiple fixed beamformers are linearly combined to improve robustness. Nevertheless, ACC often fails in highly non-stationary scenarios, such as rapidly moving interference, since its adaptive updates cannot reliably track rapid changes. To overcome this limitation, we propose a frame-online neural fusion framework for multiple distortionless differential beamformers, which estimates the combination weights through a neural network. Compared with conventional ACC, the proposed method adapts more effectively to dynamic acoustic environments, achieving stronger interference suppression while maintaining the distortionless constraint.
