Table of Contents
Fetching ...

Music Enhancement with Deep Filters: A Technical Report for The ICASSP 2024 Cadenza Challenge

Keren Shao, Ke Chen, Shlomo Dubnov

TL;DR

This work targets music enhancement for hearing-aid users in the ICASSP 2024 Cadenza Challenge by integrating DeepFilterNet's deep-filter approach into a Spec-UNet-based remixing pipeline built on a pre-trained hdemucs baseline. The model processes a concatenation of STFTs and can output either a complex ratio mask or a deep-filter of order $N$, with a loss grounded in the inverse STFT reconstruction. Empirical results on MUSDB18 show incremental SDR and HAAQI gains when employing the deep-filter mechanism, particularly when replacing the complex ratio mask with a deep-filter, though some baselines (e.g., the pure DeepFilterNet variant) underperform. The findings suggest that spectro-temporal filtering aligned with temporal fine structure can improve perceptual quality for hearing-aid users, but generalization to new listeners remains an open challenge for future work.

Abstract

In this challenge, we disentangle the deep filters from the original DeepfilterNet and incorporate them into our Spec-UNet-based network to further improve a hybrid Demucs (hdemucs) based remixing pipeline. The motivation behind the use of the deep filter component lies at its potential in better handling temporal fine structures. We demonstrate an incremental improvement in both the Signal-to-Distortion Ratio (SDR) and the Hearing Aid Audio Quality Index (HAAQI) metrics when comparing the performance of hdemucs against different versions of our model.

Music Enhancement with Deep Filters: A Technical Report for The ICASSP 2024 Cadenza Challenge

TL;DR

This work targets music enhancement for hearing-aid users in the ICASSP 2024 Cadenza Challenge by integrating DeepFilterNet's deep-filter approach into a Spec-UNet-based remixing pipeline built on a pre-trained hdemucs baseline. The model processes a concatenation of STFTs and can output either a complex ratio mask or a deep-filter of order , with a loss grounded in the inverse STFT reconstruction. Empirical results on MUSDB18 show incremental SDR and HAAQI gains when employing the deep-filter mechanism, particularly when replacing the complex ratio mask with a deep-filter, though some baselines (e.g., the pure DeepFilterNet variant) underperform. The findings suggest that spectro-temporal filtering aligned with temporal fine structure can improve perceptual quality for hearing-aid users, but generalization to new listeners remains an open challenge for future work.

Abstract

In this challenge, we disentangle the deep filters from the original DeepfilterNet and incorporate them into our Spec-UNet-based network to further improve a hybrid Demucs (hdemucs) based remixing pipeline. The motivation behind the use of the deep filter component lies at its potential in better handling temporal fine structures. We demonstrate an incremental improvement in both the Signal-to-Distortion Ratio (SDR) and the Hearing Aid Audio Quality Index (HAAQI) metrics when comparing the performance of hdemucs against different versions of our model.
Paper Structure (8 sections, 2 equations, 2 figures, 1 table)

This paper contains 8 sections, 2 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: Cadenza Challenge Pipeline. The circled numbers 1, 2, and 3 indicate the components of the input to our designed model. The model output then replaces the role of component 3 and is compared against the ground truth remix.
  • Figure 2: Our Model Architecture. The dot represents vector inner product. 'pre-NALR' corresponds to the stereo remix in locations 1 and 2, while 'NALRed' corresponds to the remix in location 3 in Figure 1.