Table of Contents
Fetching ...

Controllable joint noise reduction and hearing loss compensation using a differentiable auditory model

Philippe Gonzalez, Torsten Dau, Tobias May

TL;DR

This work tackles the lack of ground-truth targets in deep learning-based hearing loss compensation by introducing a differentiable auditory model and a multitask framework that jointly performs noise reduction (NR) and hearing loss compensation (HLC). The speech processor produces denoised and compensated outputs, which are fused at inference via a controllable mixing parameter $\alpha$, while learning is guided by two task-specific losses weighted through an uncertainty-based scheme. The approach achieves competitive objective performance with task-specific baselines and enables flexible balancing between NR and HLC in real time, without retraining or emulator retraining. The use of audiogram-conditioned FiLM, a differentiable auditory model, and the uncertainty-weighted joint objective collectively enable a practical, listener-adaptive solution for hearing aids that can be tuned to environment and user preferences.

Abstract

Deep learning-based hearing loss compensation (HLC) seeks to enhance speech intelligibility and quality for hearing impaired listeners using neural networks. One major challenge of HLC is the lack of a ground-truth target. Recent works have used neural networks to emulate non-differentiable auditory peripheral models in closed-loop frameworks, but this approach lacks flexibility. Alternatively, differentiable auditory models allow direct optimization, yet previous studies focused on individual listener profiles, or joint noise reduction (NR) and HLC without balancing each task. This work formulates NR and HLC as a multi-task learning problem, training a system to simultaneously predict denoised and compensated signals from noisy speech and audiograms using a differentiable auditory model. Results show the system achieves similar objective metric performance to systems trained for each task separately, while being able to adjust the balance between NR and HLC during inference.

Controllable joint noise reduction and hearing loss compensation using a differentiable auditory model

TL;DR

This work tackles the lack of ground-truth targets in deep learning-based hearing loss compensation by introducing a differentiable auditory model and a multitask framework that jointly performs noise reduction (NR) and hearing loss compensation (HLC). The speech processor produces denoised and compensated outputs, which are fused at inference via a controllable mixing parameter , while learning is guided by two task-specific losses weighted through an uncertainty-based scheme. The approach achieves competitive objective performance with task-specific baselines and enables flexible balancing between NR and HLC in real time, without retraining or emulator retraining. The use of audiogram-conditioned FiLM, a differentiable auditory model, and the uncertainty-weighted joint objective collectively enable a practical, listener-adaptive solution for hearing aids that can be tuned to environment and user preferences.

Abstract

Deep learning-based hearing loss compensation (HLC) seeks to enhance speech intelligibility and quality for hearing impaired listeners using neural networks. One major challenge of HLC is the lack of a ground-truth target. Recent works have used neural networks to emulate non-differentiable auditory peripheral models in closed-loop frameworks, but this approach lacks flexibility. Alternatively, differentiable auditory models allow direct optimization, yet previous studies focused on individual listener profiles, or joint noise reduction (NR) and HLC without balancing each task. This work formulates NR and HLC as a multi-task learning problem, training a system to simultaneously predict denoised and compensated signals from noisy speech and audiograms using a differentiable auditory model. Results show the system achieves similar objective metric performance to systems trained for each task separately, while being able to adjust the balance between NR and HLC during inference.

Paper Structure

This paper contains 12 sections, 6 equations, 4 figures, 1 table.

Figures (4)

  • Figure 1: Typical framework for -only, -only, or joint and without control.
  • Figure 2: Proposed framework for controllable joint and .
  • Figure 3: Speech processor based on luo2023music. Features extracted from the audiogram are fed to each layer using FiLM conditioning. The mask estimation module outputs a complex-valued mask and residual spectrogram for the denoised signal, the compensated signal, or both.
  • Figure 4: and as a function of the mixing parameter $\alpha$, when using audiograms and loss.