DFingerNet: Noise-Adaptive Speech Enhancement for Hearing Aids
Iosif Tsangko, Andreas Triantafyllopoulos, Michael Müller, Hendrik Schröter, Björn W. Schuller
TL;DR
This work tackles the challenge of adapting lightweight hearing-aid speech enhancement models to diverse acoustic environments. It introduces DFingerNet (DFiN), an extension of the DFN architecture that adds a dedicated fingerprint encoder to condition denoising on environment-specific noise fingerprints, with fusion strategies including additive and attention-based methods, while keeping the main model pretrained. The dataset strategy crops the first second of noise as fingerprints and mixes noise at random SNRs, evaluating on VCTK with DEMAND, FSD50k, and ESC-50. Results show notable gains in SI-SDR, PESQ, STOI, and DNSMOS, robust performance under fingerprint mismatches and distribution shifts, and practical feasibility due to off-device fingerprint processing and optional usage.
Abstract
The DeepFilterNet (DFN) architecture was recently proposed as a deep learning model suited for hearing aid devices. Despite its competitive performance on numerous benchmarks, it still follows a `one-size-fits-all' approach, which aims to train a single, monolithic architecture that generalises across different noises and environments. However, its limited size and computation budget can hamper its generalisability. Recent work has shown that in-context adaptation can improve performance by conditioning the denoising process on additional information extracted from background recordings to mitigate this. These recordings can be offloaded outside the hearing aid, thus improving performance while adding minimal computational overhead. We introduce these principles to the DFN model, thus proposing the DFingerNet (DFiN) model, which shows superior performance on various benchmarks inspired by the DNS Challenge.
