Table of Contents
Fetching ...

The SJTU X-LANCE Lab System for MSR Challenge 2025

Jinxuan Zhu, Hao Qiu, Haina Zhu, Jianwei Yu, Kai Yu, Xie Chen

TL;DR

The paper tackles music source restoration (MSR) for an $8$-instrument mastered mixture by introducing a sequential BS-RoFormer pipeline that denoises, separates, and dereverbs the signal, leveraging pretrained MSS checkpoints and targeted finetuning. It designs a cascaded separation scheme starting from a frozen 6-stem model and adding four fine-tuned refinements to reach $8$ stems, paired with a restoration path that denoises all components and dereverbs vocals only. Training employs data cleaning/mixing, random mixture augmentation, and longer context up to $10$s, with losses combining $L_1$ and multi-resolution STFT criteria. The approach achieves top performance on MSRBench, with MMSNR of $4.4623$ and FAD of $0.1988$, and is open-sourced at https://github.com/ModistAndrew/xlance-msr, signaling practical impact for professional remixing and historical audio restoration.

Abstract

This report describes the system submitted to the music source restoration (MSR) Challenge 2025. Our approach is composed of sequential BS-RoFormers, each dealing with a single task including music source separation (MSS), denoise and dereverb. To support 8 instruments given in the task, we utilize pretrained checkpoints from MSS community and finetune the MSS model with several training schemes, including (1) mixing and cleaning of datasets; (2) random mixture of music pieces for data augmentation; (3) scale-up of audio length. Our system achieved the first rank in all three subjective and three objective evaluation metrics, including an MMSNR score of 4.4623 and an FAD score of 0.1988. We have open-sourced all the code and checkpoints at https://github.com/ModistAndrew/xlance-msr.

The SJTU X-LANCE Lab System for MSR Challenge 2025

TL;DR

The paper tackles music source restoration (MSR) for an -instrument mastered mixture by introducing a sequential BS-RoFormer pipeline that denoises, separates, and dereverbs the signal, leveraging pretrained MSS checkpoints and targeted finetuning. It designs a cascaded separation scheme starting from a frozen 6-stem model and adding four fine-tuned refinements to reach stems, paired with a restoration path that denoises all components and dereverbs vocals only. Training employs data cleaning/mixing, random mixture augmentation, and longer context up to s, with losses combining and multi-resolution STFT criteria. The approach achieves top performance on MSRBench, with MMSNR of and FAD of , and is open-sourced at https://github.com/ModistAndrew/xlance-msr, signaling practical impact for professional remixing and historical audio restoration.

Abstract

This report describes the system submitted to the music source restoration (MSR) Challenge 2025. Our approach is composed of sequential BS-RoFormers, each dealing with a single task including music source separation (MSS), denoise and dereverb. To support 8 instruments given in the task, we utilize pretrained checkpoints from MSS community and finetune the MSS model with several training schemes, including (1) mixing and cleaning of datasets; (2) random mixture of music pieces for data augmentation; (3) scale-up of audio length. Our system achieved the first rank in all three subjective and three objective evaluation metrics, including an MMSNR score of 4.4623 and an FAD score of 0.1988. We have open-sourced all the code and checkpoints at https://github.com/ModistAndrew/xlance-msr.
Paper Structure (8 sections, 1 figure, 1 table)

This paper contains 8 sections, 1 figure, 1 table.

Figures (1)

  • Figure 1: System architecture of the sequential BS-RoFormer framework. The input mixture is first processed by a denoise module to remove noise that adversely affects later separation process. The cleaned signal then passes through a frozen pretrained 6-stem separation model, followed by additional fine-tuned separation models for handling certain instruments. Finally, only the vocals stem undergoes dereverb.