Real-time multichannel deep speech enhancement in hearing aids: Comparing monaural and binaural processing in complex acoustic scenarios

Nils L. Westhausen; Hendrik Kayser; Theresa Jansen; Bernd T. Meyer

Real-time multichannel deep speech enhancement in hearing aids: Comparing monaural and binaural processing in complex acoustic scenarios

Nils L. Westhausen, Hendrik Kayser, Theresa Jansen, Bernd T. Meyer

TL;DR

While in diffuse noise, all algorithms perform similarly, the binaural deep learning approach performs best in the presence of spatial interferers and can be attributed to improvements at low SNRs and to precise spatial filtering.

Abstract

Deep learning has the potential to enhance speech signals and increase their intelligibility for users of hearing aids. Deep models suited for real-world application should feature a low computational complexity and low processing delay of only a few milliseconds. In this paper, we explore deep speech enhancement that matches these requirements and contrast monaural and binaural processing algorithms in two complex acoustic scenes. Both algorithms are evaluated with objective metrics and in experiments with hearing-impaired listeners performing a speech-in-noise test. Results are compared to two traditional enhancement strategies, i.e., adaptive differential microphone processing and binaural beamforming. While in diffuse noise, all algorithms perform similarly, the binaural deep learning approach performs best in the presence of spatial interferers. Through a post-analysis, this can be attributed to improvements at low SNRs and to precise spatial filtering.

Real-time multichannel deep speech enhancement in hearing aids: Comparing monaural and binaural processing in complex acoustic scenarios

TL;DR

Abstract

Paper Structure (27 sections, 3 equations, 6 figures, 1 table)

This paper contains 27 sections, 3 equations, 6 figures, 1 table.

Introduction
Methods
Deep spatial and post filtering
Architecture
Training data generation
Model and training configuration
Baseline algorithms
Complex acoustic scenes
Hearing-aid configuration
Subjective measurement procedure
Subjects
Objective evaluation
Results
Results of subjective measurement of speech intelligibility
Results in terms of HASPI
...and 12 more sections

Figures (6)

Figure 1: Illustration of the proposed approach for spatial filtering and post filtering. The filters for the left and right side are estimated by separate models. The dashed-dotted line symbolizes an optional exchange of the complex TF representation of microphone signals for binaural input features.
Figure 2: Illustration of the proposed filter estimation model. Reshape operations with (*) include axes permutation.
Figure 3: Hearing thresholds (HT) in dB hearing level (HL) of all subjects. The colored lines show the mean HT and the standard deviation. Individual audiograms are shown in gray.
Figure 4: Results for different HA enhancement strategies for subjective listening tests (left column) and objective metrics HASPI (middle column) and MBSTOI (right column). The mean is marked by black bars and the median by white circles inside the violin. The axes are reversed for the left column. Violin plots are shown for two acoustic scenes (top and bottom row, respectively). The $r$-values show the correlation of the objective metric with the subjective measurements, both on subject level ($r_{sub}$) and for the medians ($r_{med}$).
Figure 5: Objective metrics in terms of HASPI (first row) and MBSTOI (second row) pooled over all subjects plotted against SNR for both acoustic scenes.
...and 1 more figures

Real-time multichannel deep speech enhancement in hearing aids: Comparing monaural and binaural processing in complex acoustic scenarios

TL;DR

Abstract

Real-time multichannel deep speech enhancement in hearing aids: Comparing monaural and binaural processing in complex acoustic scenarios

Authors

TL;DR

Abstract

Table of Contents

Figures (6)