Table of Contents
Fetching ...

Sub-band and Full-band Interactive U-Net with DPRNN for Demixing Cross-talk Stereo Music

Han Yin, Mou Wang, Jisheng Bai, Dongyuan Shi, Woon-Seng Gan, Jianfeng Chen

TL;DR

This work targets demixing cross-talk stereo music to enable personalized remixing for listeners with hearing impairment. It introduces a sub-band and full-band interactive U-Net powered by DPRNNs, with masks from each band fused in an interactive block and refined by an MLP-based neural beamformer to adjust phase. The method is evaluated in the ICASSP 2024 Cadenza setting, achieving HAAQI comparable to or better than OpenUnmix and HDemucs, with ensemble gains. The results demonstrate that multi-band fusion and learned beamforming can effectively suppress cross-talk while preserving perceptual quality, enabling more flexible listening experiences.

Abstract

This paper presents a detailed description of our proposed methods for the ICASSP 2024 Cadenza Challenge. Experimental results show that the proposed system can achieve better performance than official baselines.

Sub-band and Full-band Interactive U-Net with DPRNN for Demixing Cross-talk Stereo Music

TL;DR

This work targets demixing cross-talk stereo music to enable personalized remixing for listeners with hearing impairment. It introduces a sub-band and full-band interactive U-Net powered by DPRNNs, with masks from each band fused in an interactive block and refined by an MLP-based neural beamformer to adjust phase. The method is evaluated in the ICASSP 2024 Cadenza setting, achieving HAAQI comparable to or better than OpenUnmix and HDemucs, with ensemble gains. The results demonstrate that multi-band fusion and learned beamforming can effectively suppress cross-talk while preserving perceptual quality, enabling more flexible listening experiences.

Abstract

This paper presents a detailed description of our proposed methods for the ICASSP 2024 Cadenza Challenge. Experimental results show that the proposed system can achieve better performance than official baselines.
Paper Structure (8 sections, 2 equations, 2 figures, 2 tables)

This paper contains 8 sections, 2 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: The architecture of the proposed sub-band and full-band interactive U-Net with DPRNN for demixing cross-talk stereo music
  • Figure 2: The architecture of the U-Net with DPRNN