FADI-AEC: Fast Score Based Diffusion Model Guided by Far-end Signal for Acoustic Echo Cancellation
Yang Liu, Li Wan, Yun Li, Yiteng Huang, Ming Sun, James Luan, Yangyang Shi, Xin Lei
TL;DR
The paper tackles acoustic echo cancellation by introducing diffusion-based approaches that provide probabilistic, high-quality clean speech estimates. It proposes DI-AEC and its efficient variant FADI-AEC, which leverages far-end guided noise and a fast, per-frame score computation to reduce computational load for edge devices. A far-end conditioned score model and a fast score formulation enable stable reconstruction with markedly lower latency while preserving or improving quality metrics like ERLE and PESQ. Evaluations on the ICASSP 2023 deep echo cancellation dataset show competitive or superior performance to state-of-the-art methods, highlighting practical impact for real-time, resource-constrained scenarios.
Abstract
Despite the potential of diffusion models in speech enhancement, their deployment in Acoustic Echo Cancellation (AEC) has been restricted. In this paper, we propose DI-AEC, pioneering a diffusion-based stochastic regeneration approach dedicated to AEC. Further, we propose FADI-AEC, fast score-based diffusion AEC framework to save computational demands, making it favorable for edge devices. It stands out by running the score model once per frame, achieving a significant surge in processing efficiency. Apart from that, we introduce a novel noise generation technique where far-end signals are utilized, incorporating both far-end and near-end signals to refine the score model's accuracy. We test our proposed method on the ICASSP2023 Microsoft deep echo cancellation challenge evaluation dataset, where our method outperforms some of the end-to-end methods and other diffusion based echo cancellation methods.
