Table of Contents
Fetching ...

End-to-End Multi-Task Learning for Adjustable Joint Noise Reduction and Hearing Loss Compensation

Philippe Gonzalez, Vera Margrethe Frederiksen, Torsten Dau, Tobias May

Abstract

A multi-task learning framework is proposed for optimizing a single deep neural network (DNN) for joint noise reduction (NR) and hearing loss compensation (HLC). A distinct training objective is defined for each task, and the DNN predicts two time-frequency masks. During inference, the amounts of NR and HLC can be adjusted independently by exponentiating each mask before combining them. In contrast to recent approaches that rely on training an auditory-model emulator to define a differentiable training objective, we propose an auditory model that is inherently differentiable, thus allowing end-to-end optimization. The audiogram is provided as an input to the DNN, thereby enabling listener-specific personalization without the need for retraining. Results show that the proposed approach not only allows adjusting the amounts of NR and HLC individually, but also improves objective metrics compared to optimizing a single training objective. It also outperforms a cascade of two DNNs that were separately trained for NR and HLC, and shows competitive HLC performance compared to a traditional hearing-aid prescription. To the best of our knowledge, this is the first study that uses an auditory model to train a single DNN for both NR and HLC across a wide range of listener profiles.

End-to-End Multi-Task Learning for Adjustable Joint Noise Reduction and Hearing Loss Compensation

Abstract

A multi-task learning framework is proposed for optimizing a single deep neural network (DNN) for joint noise reduction (NR) and hearing loss compensation (HLC). A distinct training objective is defined for each task, and the DNN predicts two time-frequency masks. During inference, the amounts of NR and HLC can be adjusted independently by exponentiating each mask before combining them. In contrast to recent approaches that rely on training an auditory-model emulator to define a differentiable training objective, we propose an auditory model that is inherently differentiable, thus allowing end-to-end optimization. The audiogram is provided as an input to the DNN, thereby enabling listener-specific personalization without the need for retraining. Results show that the proposed approach not only allows adjusting the amounts of NR and HLC individually, but also improves objective metrics compared to optimizing a single training objective. It also outperforms a cascade of two DNNs that were separately trained for NR and HLC, and shows competitive HLC performance compared to a traditional hearing-aid prescription. To the best of our knowledge, this is the first study that uses an auditory model to train a single DNN for both NR and HLC across a wide range of listener profiles.
Paper Structure (28 sections, 14 equations, 8 figures, 1 table)

This paper contains 28 sections, 14 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: Traditional single-DNN frameworks for (a) NR-only, (b) HLC-only, and (c) joint NR and HLC using auditory models.
  • Figure 2: Proposed multi-task framework for adjustable joint NR and HLC using a single DNN and two auditory models. During training (a), the DNN predicts both a denoised and a compensated signal, and a distinct training objective is defined for each output. During inference (b), the amount of NR and HLC can be adjusted independently by exponentiating each predicted time-frequency mask.
  • Figure 3: BSRNN-based DNN backbone. The band-split module projects $K$ frequency bands with increasing width $\{G_i\}_{i=1}^K$ onto a fixed number of channels $N$. Interleaved sequence modelling across time and bands is performed using residual LSTM blocks in $L$ layers. Features extracted from the audiogram are included in each layer using FiLM. The band-merging module projects each band back to the original frequency resolution and outputs real- or complex-valued masks $M_\mathrm{NR}$ and $M_\mathrm{HLC}$.
  • Figure 4: Overview of the proposed differentiable auditory model.
  • Figure 5: Objective metrics for real- or complex-valued masks and MSE or MAE loss. For PESQ, ESTOI, and SNR, a NH audiogram is provided to systems providing HLC. For HASPI and HASQI, results are averaged over all audiograms. Both $\alpha_\mathrm{NR}$ and $\alpha_\mathrm{HLC}$ are set to 1 for DNN-CNRHLC.
  • ...and 3 more figures