The SJTU X-LANCE Lab System for MSR Challenge 2025
Jinxuan Zhu, Hao Qiu, Haina Zhu, Jianwei Yu, Kai Yu, Xie Chen
TL;DR
The paper tackles music source restoration (MSR) for an $8$-instrument mastered mixture by introducing a sequential BS-RoFormer pipeline that denoises, separates, and dereverbs the signal, leveraging pretrained MSS checkpoints and targeted finetuning. It designs a cascaded separation scheme starting from a frozen 6-stem model and adding four fine-tuned refinements to reach $8$ stems, paired with a restoration path that denoises all components and dereverbs vocals only. Training employs data cleaning/mixing, random mixture augmentation, and longer context up to $10$s, with losses combining $L_1$ and multi-resolution STFT criteria. The approach achieves top performance on MSRBench, with MMSNR of $4.4623$ and FAD of $0.1988$, and is open-sourced at https://github.com/ModistAndrew/xlance-msr, signaling practical impact for professional remixing and historical audio restoration.
Abstract
This report describes the system submitted to the music source restoration (MSR) Challenge 2025. Our approach is composed of sequential BS-RoFormers, each dealing with a single task including music source separation (MSS), denoise and dereverb. To support 8 instruments given in the task, we utilize pretrained checkpoints from MSS community and finetune the MSS model with several training schemes, including (1) mixing and cleaning of datasets; (2) random mixture of music pieces for data augmentation; (3) scale-up of audio length. Our system achieved the first rank in all three subjective and three objective evaluation metrics, including an MMSNR score of 4.4623 and an FAD score of 0.1988. We have open-sourced all the code and checkpoints at https://github.com/ModistAndrew/xlance-msr.
