Sub-band and Full-band Interactive U-Net with DPRNN for Demixing Cross-talk Stereo Music

Han Yin; Mou Wang; Jisheng Bai; Dongyuan Shi; Woon-Seng Gan; Jianfeng Chen

Sub-band and Full-band Interactive U-Net with DPRNN for Demixing Cross-talk Stereo Music

Han Yin, Mou Wang, Jisheng Bai, Dongyuan Shi, Woon-Seng Gan, Jianfeng Chen

TL;DR

This work targets demixing cross-talk stereo music to enable personalized remixing for listeners with hearing impairment. It introduces a sub-band and full-band interactive U-Net powered by DPRNNs, with masks from each band fused in an interactive block and refined by an MLP-based neural beamformer to adjust phase. The method is evaluated in the ICASSP 2024 Cadenza setting, achieving HAAQI comparable to or better than OpenUnmix and HDemucs, with ensemble gains. The results demonstrate that multi-band fusion and learned beamforming can effectively suppress cross-talk while preserving perceptual quality, enabling more flexible listening experiences.

Abstract

This paper presents a detailed description of our proposed methods for the ICASSP 2024 Cadenza Challenge. Experimental results show that the proposed system can achieve better performance than official baselines.

Sub-band and Full-band Interactive U-Net with DPRNN for Demixing Cross-talk Stereo Music

TL;DR

Abstract

Paper Structure (8 sections, 2 equations, 2 figures, 2 tables)

This paper contains 8 sections, 2 equations, 2 figures, 2 tables.

Introduction
Methods
Proposed Demixing Model
U-Net with DPRNN
Interactive Block
Loss Function
Experiments and Discussion
Conclusions

Figures (2)

Figure 1: The architecture of the proposed sub-band and full-band interactive U-Net with DPRNN for demixing cross-talk stereo music
Figure 2: The architecture of the U-Net with DPRNN

Sub-band and Full-band Interactive U-Net with DPRNN for Demixing Cross-talk Stereo Music

TL;DR

Abstract

Sub-band and Full-band Interactive U-Net with DPRNN for Demixing Cross-talk Stereo Music

Authors

TL;DR

Abstract

Table of Contents

Figures (2)