Table of Contents
Fetching ...

Integrated Minimum Mean Squared Error Algorithms for Combined Acoustic Echo Cancellation and Noise Reduction

Arnout Roebben, Toon van Waterschoot, Jan Wouters, Marc Moonen

TL;DR

This work tackles the problem of jointly suppressing acoustic echo and near-end noise in multi-microphone/multi-loudspeaker setups by formulating a single MMSE objective. It derives an extended multi-channel Wiener filter (MWFext) using an extended signal model, and shows that MWFext is theoretically equivalent to cascade algorithms such as AEC-NR, NR-AEC, and NRext-AEC-PF under certain conditions, including rank constraints. Practical performance differences arise from non-stationarities and correlation-matrix estimation errors, with AEC-NR and NRext-AEC-PF generally delivering the best overall results. The paper provides a comprehensive framework, including practical considerations, computational complexity, and experimental validation (Setup-1 and Setup-2), and connects model-based approaches with data-driven opportunities to enhance real-world performance.

Abstract

In many speech recording applications, noise and acoustic echo corrupt the desired speech. Consequently, combined noise reduction (NR) and acoustic echo cancellation (AEC) is required. Generally, a cascade approach is followed, i.e., the AEC and NR are designed in isolation by selecting a separate signal model, separate cost function, and separate solution strategy. The AEC and NR are then cascaded one after the other, not accounting for their interaction. In this paper, an integrated approach is proposed to consider this interaction in a general multi-microphone/multi-loudspeaker setup. Therefore, a single signal model of either the microphone signal vector or the extended signal vector, obtained by stacking microphone and loudspeaker signals, is selected, a single mean squared error cost function is formulated, and a common solution strategy is used. Using this microphone signal model, a multi-channel Wiener filter (MWF) is derived. Using the extended signal model, it is shown that an extended MWF (MWFext) can be derived, and several equivalent expressions can be found, which are nevertheless shown to be interpretable as cascade algorithms. Specifically, the MWFext is shown to be equivalent to algorithms where the AEC precedes the NR (AEC-NR), the NR precedes the AEC (NR-AEC), and the extended NR (NRext) precedes the AEC and post-filter (PF) (NRext-AEC-PF). Under rank-deficiency conditions the MWFext is non-unique. Equivalence then amounts to the expressions being specific, not necessarily minimum-norm solutions, for this MWFext. The practical performances differ due to non-stationarities and imperfect correlation matrix estimation, with the AEC-NR and NRext-AEC-PF attaining best overall performance.

Integrated Minimum Mean Squared Error Algorithms for Combined Acoustic Echo Cancellation and Noise Reduction

TL;DR

This work tackles the problem of jointly suppressing acoustic echo and near-end noise in multi-microphone/multi-loudspeaker setups by formulating a single MMSE objective. It derives an extended multi-channel Wiener filter (MWFext) using an extended signal model, and shows that MWFext is theoretically equivalent to cascade algorithms such as AEC-NR, NR-AEC, and NRext-AEC-PF under certain conditions, including rank constraints. Practical performance differences arise from non-stationarities and correlation-matrix estimation errors, with AEC-NR and NRext-AEC-PF generally delivering the best overall results. The paper provides a comprehensive framework, including practical considerations, computational complexity, and experimental validation (Setup-1 and Setup-2), and connects model-based approaches with data-driven opportunities to enhance real-world performance.

Abstract

In many speech recording applications, noise and acoustic echo corrupt the desired speech. Consequently, combined noise reduction (NR) and acoustic echo cancellation (AEC) is required. Generally, a cascade approach is followed, i.e., the AEC and NR are designed in isolation by selecting a separate signal model, separate cost function, and separate solution strategy. The AEC and NR are then cascaded one after the other, not accounting for their interaction. In this paper, an integrated approach is proposed to consider this interaction in a general multi-microphone/multi-loudspeaker setup. Therefore, a single signal model of either the microphone signal vector or the extended signal vector, obtained by stacking microphone and loudspeaker signals, is selected, a single mean squared error cost function is formulated, and a common solution strategy is used. Using this microphone signal model, a multi-channel Wiener filter (MWF) is derived. Using the extended signal model, it is shown that an extended MWF (MWFext) can be derived, and several equivalent expressions can be found, which are nevertheless shown to be interpretable as cascade algorithms. Specifically, the MWFext is shown to be equivalent to algorithms where the AEC precedes the NR (AEC-NR), the NR precedes the AEC (NR-AEC), and the extended NR (NRext) precedes the AEC and post-filter (PF) (NRext-AEC-PF). Under rank-deficiency conditions the MWFext is non-unique. Equivalence then amounts to the expressions being specific, not necessarily minimum-norm solutions, for this MWFext. The practical performances differ due to non-stationarities and imperfect correlation matrix estimation, with the AEC-NR and NRext-AEC-PF attaining best overall performance.

Paper Structure

This paper contains 48 sections, 1 theorem, 107 equations, 5 figures, 3 tables.

Key Result

Theorem 10.1

Define the Hermitian positive-semidefinite matrices $A\in\mathbb{C}^{M\times M}$ and $B\in\mathbb{C}^{M\times M}$, where the column space of $A$ is contained within the column space of $B$. Further define $C\in\mathbb{C}^{M\times L}$, whose column space is contained within the column space of $A$. T

Figures (5)

  • Figure 1: Algorithms for combined acoustic echo cancellation (AEC) and noise reduction (NR) aim at providing an estimate of the desired speech $\hat{s}_r$ in reference microphone $r$ by suppressing the near-end room noise signal vector $\mathbf{n}$ and echo signal vector $\mathbf{e}$, originating from the loudspeaker signals $l_j$, $j\in\{1,\cdots,L\}$, by means of the echo path map $F(.)$. To this end, the microphone signal vector $\mathbf{m}$ and possibly the loudspeaker signals $l_j$ are utilised. The figurines have been generated using tikzpeople.
  • Figure 2: Cascade interpretation of the integrated algorithms. (a) The MWF is obtained using the microphone signal model. (b) The MWFext is obtained using the extended signal model, which can be shown to be theoretically equivalent to (c) the AEC-NR, (d) the NR-AEC, and (e) the NRext-AEC-PF.
  • Figure 3: Performance attained by each filter in the NRext-AEC-PF by means of the intelligibility-weighted signal-to-echo ratio improvement $\left(\Delta\text{SERI}\right)$, intelligibility-weighted signal-to-noise ratio improvement $\left(\Delta\text{SNRI}\right)$, and intelligibility weighted speech distortion. The mean performance is represented by the solid dots and the standard deviation by the shading. The performance generally increases with each filter in the cascade.
  • Figure 4: Performance comparison between the integrated algorithms for Setup-1. The mean performance is represented by the solid line and the standard deviation by the shading. Due to non-stationarities and imperfect correlation matrix estimation the theoretically equivalent integrated algorithms differ in their practical performances. To this end, the AEC-NR and NRext-AEC-PF generally attain the best performance.
  • Figure 5: Performance comparison for Setup-2. The NRext-AEC-PF is still effective despite the additive map assumption no longer holding true due to loudspeaker non-linearities, and performs similar to the AEC-NR. The mean performance only drops slightly when practical VADs are used.

Theorems & Definitions (6)

  • proof
  • proof
  • proof
  • proof
  • Theorem 10.1
  • proof