Why Not Put a Microphone Near the Loudspeaker? A New Paradigm for Acoustic Echo Cancellation
Fei Zhao, Zhong-Qiu Wang
TL;DR
This paper tackles nonlinear distortions in acoustic echo cancellation (AEC) by introducing a dual-microphone setup with an auxiliary reference microphone placed near the loudspeaker to capture nonlinear far-end signals. A Wiener-filter-based preprocessing pipeline purifies the reference by suppressing near-end contamination, enabling a linear AEC stage, which is followed by a neural AEC module that jointly suppresses residual echo and noise. The approach combines a weighted short-time Wiener solution for the linear path, a masked-reference refinement, and an ICCRN-based neural module trained with a composite RI+Mag and S-SISNR loss. Experiments on matched and mismatched nonlinear scenarios demonstrate robust, state-of-the-art performance across PESQ, SDR, and ERLE, validating the practicality of leveraging an auxiliary reference mic and refinened reference signals for real-world AEC under unknown nonlinearities.
Abstract
Acoustic echo cancellation (AEC) remains challenging in real-world environments due to nonlinear distortions caused by low-cost loudspeakers and complex room acoustics. To mitigate these issues, we introduce a dual-microphone configuration, where an auxiliary reference microphone is placed near the loudspeaker to capture the nonlinearly distorted far-end signal. Although this reference signal is contaminated by near-end speech, we propose a preprocessing module based on Wiener filtering to estimate a compressed time-frequency mask to suppress near-end components. This purified reference signal enables a more effective linear AEC stage, whose residual error signal is then fed to a deep neural network for joint residual echo and noise suppression. Evaluation results show that our method outperforms baseline approaches on matched test sets. To evaluate its robustness under strong nonlinearities, we further test it on a mismatched dataset and observe that it achieves substantial performance gains. These results demonstrate its effectiveness in practical scenarios where the nonlinear distortions are typically unknown.
