Music Enhancement with Deep Filters: A Technical Report for The ICASSP 2024 Cadenza Challenge
Keren Shao, Ke Chen, Shlomo Dubnov
TL;DR
This work targets music enhancement for hearing-aid users in the ICASSP 2024 Cadenza Challenge by integrating DeepFilterNet's deep-filter approach into a Spec-UNet-based remixing pipeline built on a pre-trained hdemucs baseline. The model processes a concatenation of STFTs and can output either a complex ratio mask or a deep-filter of order $N$, with a loss grounded in the inverse STFT reconstruction. Empirical results on MUSDB18 show incremental SDR and HAAQI gains when employing the deep-filter mechanism, particularly when replacing the complex ratio mask with a deep-filter, though some baselines (e.g., the pure DeepFilterNet variant) underperform. The findings suggest that spectro-temporal filtering aligned with temporal fine structure can improve perceptual quality for hearing-aid users, but generalization to new listeners remains an open challenge for future work.
Abstract
In this challenge, we disentangle the deep filters from the original DeepfilterNet and incorporate them into our Spec-UNet-based network to further improve a hybrid Demucs (hdemucs) based remixing pipeline. The motivation behind the use of the deep filter component lies at its potential in better handling temporal fine structures. We demonstrate an incremental improvement in both the Signal-to-Distortion Ratio (SDR) and the Hearing Aid Audio Quality Index (HAAQI) metrics when comparing the performance of hdemucs against different versions of our model.
