Table of Contents
Fetching ...

Neural Kalman Filters for Acoustic Echo Cancellation

Ernst Seidel, Gerald Enzner, Pejman Mowlaee, Tim Fingscheidt

TL;DR

The paper addresses acoustic echo cancellation in hands-free systems, focusing on tracking time-varying echo paths under double-talk and nonlinear loudspeaker distortions. It revisits the frequency-domain adaptive Kalman filter (FDKF) as a model-based backbone and surveys neural Kalman filter hybrids that replace or augment the Kalman gain, state transition, and distortion modeling with DNNs in the time-frequency domain. Across multiple architectures, it demonstrates that per-bin DNNs for Kalman gain or state update can yield faster convergence and better NE speech preservation than the standard FDKF, while fully data-driven approaches may struggle with NE quality or resource constraints. The results offer design guidance for hybrid AEC systems, highlighting per-bin processing and selective distortion modeling as keys to balancing convergence, robustness to nonlinearity, and computational footprint.

Abstract

Kalman filtering is a powerful approach to adaptive filtering for various problems in signal processing. The frequency-domain adaptive Kalman filter (FDKF), based on the concept of the acoustic state space, provides a unifying solution to the adaptive filter update and the related stepsize control. It was conceived for the problem of acoustic echo cancellation and, as such, is frequently applied in hands-free systems. This article motivates and briefly recapitulates the linear FDKF and investigates how it can be further supported by deep neural networks (DNNs) in various ways, specifically to overcome the challenges and limitations related to the usually required estimation of process and observation noise covariances for the Kalman filter. While the mere FDKF comes with very low computational complexity, its neural Kalman filter variants may deliver faster (re)convergence, better echo cancellation, and even exceed the FDKF in its excellent double-talk near-end speech preservation both under linear and nonlinear loudspeaker conditions. To provide a synopsis of the state of the art, this article contributes a comparison of a range of DNN-based extensions of FDKF in the same training framework and using the same data.

Neural Kalman Filters for Acoustic Echo Cancellation

TL;DR

The paper addresses acoustic echo cancellation in hands-free systems, focusing on tracking time-varying echo paths under double-talk and nonlinear loudspeaker distortions. It revisits the frequency-domain adaptive Kalman filter (FDKF) as a model-based backbone and surveys neural Kalman filter hybrids that replace or augment the Kalman gain, state transition, and distortion modeling with DNNs in the time-frequency domain. Across multiple architectures, it demonstrates that per-bin DNNs for Kalman gain or state update can yield faster convergence and better NE speech preservation than the standard FDKF, while fully data-driven approaches may struggle with NE quality or resource constraints. The results offer design guidance for hybrid AEC systems, highlighting per-bin processing and selective distortion modeling as keys to balancing convergence, robustness to nonlinearity, and computational footprint.

Abstract

Kalman filtering is a powerful approach to adaptive filtering for various problems in signal processing. The frequency-domain adaptive Kalman filter (FDKF), based on the concept of the acoustic state space, provides a unifying solution to the adaptive filter update and the related stepsize control. It was conceived for the problem of acoustic echo cancellation and, as such, is frequently applied in hands-free systems. This article motivates and briefly recapitulates the linear FDKF and investigates how it can be further supported by deep neural networks (DNNs) in various ways, specifically to overcome the challenges and limitations related to the usually required estimation of process and observation noise covariances for the Kalman filter. While the mere FDKF comes with very low computational complexity, its neural Kalman filter variants may deliver faster (re)convergence, better echo cancellation, and even exceed the FDKF in its excellent double-talk near-end speech preservation both under linear and nonlinear loudspeaker conditions. To provide a synopsis of the state of the art, this article contributes a comparison of a range of DNN-based extensions of FDKF in the same training framework and using the same data.

Paper Structure

This paper contains 24 sections, 17 equations, 7 figures, 1 table.

Figures (7)

  • Figure 1: Generalized overview of a hands-free system / speakerphone. The speech signal of the far-end speaker is played out of a loudspeaker at the near-end, picked up at the microphone alongside near-end speech and background noise. An acoustic echo canceller and a postfilter (PF) aim at removing the echo and the background noise.
  • Figure 2: Overview of a (neural) Kalman filter approach for acoustic echo cancellation. It operates in the discrete Fourier transform (DFT) domain and is constrained to overlap-add (OLA) or overlap-save (OLS) processing of the frequency bins. Notations are for the general case of multiple filter taps (bold fonts). Note that block "T" represents a delay unit.
  • Figure 3: Details of the Kalman algorithm block as used in the NeuralKalmanZhang2023 (left) and DeepAdaptiveZhang2022d (right) solutions. Red blocks are realized by a DNN, while the filter-state update of NeuralKalman is partially DNN-based.
  • Figure 4: Model performance for an example file from test set $\mathcal{D}_{\mathrm{test}}$, represented by ERLE over time (bottom panel). Near-end speech $s(n)$ (top panel) and far-end echo $d(n)$ (center panel) are mixed at $0$ dB SER.
  • Figure 5: Model performance averaged over the STFE sections of all files with far-end speech excitation in test set $\mathcal{D}_{\mathrm{test}}$. No nonlinearities are employed, but an RIR switch after 4s.
  • ...and 2 more figures