Sub-band and Full-band Interactive U-Net with DPRNN for Demixing Cross-talk Stereo Music
Han Yin, Mou Wang, Jisheng Bai, Dongyuan Shi, Woon-Seng Gan, Jianfeng Chen
TL;DR
This work targets demixing cross-talk stereo music to enable personalized remixing for listeners with hearing impairment. It introduces a sub-band and full-band interactive U-Net powered by DPRNNs, with masks from each band fused in an interactive block and refined by an MLP-based neural beamformer to adjust phase. The method is evaluated in the ICASSP 2024 Cadenza setting, achieving HAAQI comparable to or better than OpenUnmix and HDemucs, with ensemble gains. The results demonstrate that multi-band fusion and learned beamforming can effectively suppress cross-talk while preserving perceptual quality, enabling more flexible listening experiences.
Abstract
This paper presents a detailed description of our proposed methods for the ICASSP 2024 Cadenza Challenge. Experimental results show that the proposed system can achieve better performance than official baselines.
