Table of Contents
Fetching ...

A Hybrid Approach for Low-Complexity Joint Acoustic Echo and Noise Reduction

Shrishti Saha Shetu, Naveen Kumar Desiraju, Jose Miguel Martinez Aponte, Emanuël A. P. Habets, Edwin Mabande

TL;DR

The paper tackles the high computational burden of deep learning approaches for joint acoustic echo and noise reduction (AENR) by proposing a hybrid two-stage system. A Kalman-filter (KF) preprocessor estimates the echo, and a modified ULCNet-based post-filter processes three STFT inputs to jointly suppress echo and noise with ultra-low complexity. The authors introduce a three-input ULCNet, a diagonalized KF, and a channel-wise sub-band reorientation with power-law compression to enable joint AENR while keeping memory and compute extremely low; they train on 1100 hours of data per task at 16 kHz and evaluate on Interspeech/ICASSP benchmarks. Results show competitive AER performance and favorable efficiency (up to 4× less compute, up to 10× smaller models) compared with SOTA methods, and reasonable NR performance with perceptual quality preserved in listening tests, making the approach suitable for embedded real-time deployment.

Abstract

Deep learning-based methods that jointly perform the task of acoustic echo and noise reduction (AENR) often require high memory and computational resources, making them unsuitable for real-time deployment on low-resource platforms such as embedded devices. We propose a low-complexity hybrid approach for joint AENR by employing a single model to suppress both residual echo and noise components. Specifically, we integrate the state-of-the-art (SOTA) ULCNet model, which was originally proposed to achieve ultra-low complexity noise suppression, in a hybrid system and train it for joint AENR. We show that the proposed approach achieves better echo reduction and comparable noise reduction performance with much lower computational complexity and memory requirements than all considered SOTA methods, at the cost of slight degradation in speech quality.

A Hybrid Approach for Low-Complexity Joint Acoustic Echo and Noise Reduction

TL;DR

The paper tackles the high computational burden of deep learning approaches for joint acoustic echo and noise reduction (AENR) by proposing a hybrid two-stage system. A Kalman-filter (KF) preprocessor estimates the echo, and a modified ULCNet-based post-filter processes three STFT inputs to jointly suppress echo and noise with ultra-low complexity. The authors introduce a three-input ULCNet, a diagonalized KF, and a channel-wise sub-band reorientation with power-law compression to enable joint AENR while keeping memory and compute extremely low; they train on 1100 hours of data per task at 16 kHz and evaluate on Interspeech/ICASSP benchmarks. Results show competitive AER performance and favorable efficiency (up to 4× less compute, up to 10× smaller models) compared with SOTA methods, and reasonable NR performance with perceptual quality preserved in listening tests, making the approach suitable for embedded real-time deployment.

Abstract

Deep learning-based methods that jointly perform the task of acoustic echo and noise reduction (AENR) often require high memory and computational resources, making them unsuitable for real-time deployment on low-resource platforms such as embedded devices. We propose a low-complexity hybrid approach for joint AENR by employing a single model to suppress both residual echo and noise components. Specifically, we integrate the state-of-the-art (SOTA) ULCNet model, which was originally proposed to achieve ultra-low complexity noise suppression, in a hybrid system and train it for joint AENR. We show that the proposed approach achieves better echo reduction and comparable noise reduction performance with much lower computational complexity and memory requirements than all considered SOTA methods, at the cost of slight degradation in speech quality.
Paper Structure (6 sections, 2 equations, 2 figures, 2 tables)

This paper contains 6 sections, 2 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: Flow-diagram of proposed method
  • Figure 2: Modified channel-wise feature reorientation