Table of Contents
Fetching ...

Real-time multichannel deep speech enhancement in hearing aids: Comparing monaural and binaural processing in complex acoustic scenarios

Nils L. Westhausen, Hendrik Kayser, Theresa Jansen, Bernd T. Meyer

TL;DR

While in diffuse noise, all algorithms perform similarly, the binaural deep learning approach performs best in the presence of spatial interferers and can be attributed to improvements at low SNRs and to precise spatial filtering.

Abstract

Deep learning has the potential to enhance speech signals and increase their intelligibility for users of hearing aids. Deep models suited for real-world application should feature a low computational complexity and low processing delay of only a few milliseconds. In this paper, we explore deep speech enhancement that matches these requirements and contrast monaural and binaural processing algorithms in two complex acoustic scenes. Both algorithms are evaluated with objective metrics and in experiments with hearing-impaired listeners performing a speech-in-noise test. Results are compared to two traditional enhancement strategies, i.e., adaptive differential microphone processing and binaural beamforming. While in diffuse noise, all algorithms perform similarly, the binaural deep learning approach performs best in the presence of spatial interferers. Through a post-analysis, this can be attributed to improvements at low SNRs and to precise spatial filtering.

Real-time multichannel deep speech enhancement in hearing aids: Comparing monaural and binaural processing in complex acoustic scenarios

TL;DR

While in diffuse noise, all algorithms perform similarly, the binaural deep learning approach performs best in the presence of spatial interferers and can be attributed to improvements at low SNRs and to precise spatial filtering.

Abstract

Deep learning has the potential to enhance speech signals and increase their intelligibility for users of hearing aids. Deep models suited for real-world application should feature a low computational complexity and low processing delay of only a few milliseconds. In this paper, we explore deep speech enhancement that matches these requirements and contrast monaural and binaural processing algorithms in two complex acoustic scenes. Both algorithms are evaluated with objective metrics and in experiments with hearing-impaired listeners performing a speech-in-noise test. Results are compared to two traditional enhancement strategies, i.e., adaptive differential microphone processing and binaural beamforming. While in diffuse noise, all algorithms perform similarly, the binaural deep learning approach performs best in the presence of spatial interferers. Through a post-analysis, this can be attributed to improvements at low SNRs and to precise spatial filtering.
Paper Structure (27 sections, 3 equations, 6 figures, 1 table)

This paper contains 27 sections, 3 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: Illustration of the proposed approach for spatial filtering and post filtering. The filters for the left and right side are estimated by separate models. The dashed-dotted line symbolizes an optional exchange of the complex TF representation of microphone signals for binaural input features.
  • Figure 2: Illustration of the proposed filter estimation model. Reshape operations with (*) include axes permutation.
  • Figure 3: Hearing thresholds (HT) in dB hearing level (HL) of all subjects. The colored lines show the mean HT and the standard deviation. Individual audiograms are shown in gray.
  • Figure 4: Results for different HA enhancement strategies for subjective listening tests (left column) and objective metrics HASPI (middle column) and MBSTOI (right column). The mean is marked by black bars and the median by white circles inside the violin. The axes are reversed for the left column. Violin plots are shown for two acoustic scenes (top and bottom row, respectively). The $r$-values show the correlation of the objective metric with the subjective measurements, both on subject level ($r_{sub}$) and for the medians ($r_{med}$).
  • Figure 5: Objective metrics in terms of HASPI (first row) and MBSTOI (second row) pooled over all subjects plotted against SNR for both acoustic scenes.
  • ...and 1 more figures