The SJTU X-LANCE Lab System for MSR Challenge 2025

Jinxuan Zhu; Hao Qiu; Haina Zhu; Jianwei Yu; Kai Yu; Xie Chen

The SJTU X-LANCE Lab System for MSR Challenge 2025

Jinxuan Zhu, Hao Qiu, Haina Zhu, Jianwei Yu, Kai Yu, Xie Chen

TL;DR

The paper tackles music source restoration (MSR) for an $8$-instrument mastered mixture by introducing a sequential BS-RoFormer pipeline that denoises, separates, and dereverbs the signal, leveraging pretrained MSS checkpoints and targeted finetuning. It designs a cascaded separation scheme starting from a frozen 6-stem model and adding four fine-tuned refinements to reach $8$ stems, paired with a restoration path that denoises all components and dereverbs vocals only. Training employs data cleaning/mixing, random mixture augmentation, and longer context up to $10$s, with losses combining $L_1$ and multi-resolution STFT criteria. The approach achieves top performance on MSRBench, with MMSNR of $4.4623$ and FAD of $0.1988$, and is open-sourced at https://github.com/ModistAndrew/xlance-msr, signaling practical impact for professional remixing and historical audio restoration.

Abstract

This report describes the system submitted to the music source restoration (MSR) Challenge 2025. Our approach is composed of sequential BS-RoFormers, each dealing with a single task including music source separation (MSS), denoise and dereverb. To support 8 instruments given in the task, we utilize pretrained checkpoints from MSS community and finetune the MSS model with several training schemes, including (1) mixing and cleaning of datasets; (2) random mixture of music pieces for data augmentation; (3) scale-up of audio length. Our system achieved the first rank in all three subjective and three objective evaluation metrics, including an MMSNR score of 4.4623 and an FAD score of 0.1988. We have open-sourced all the code and checkpoints at https://github.com/ModistAndrew/xlance-msr.

The SJTU X-LANCE Lab System for MSR Challenge 2025

TL;DR

The paper tackles music source restoration (MSR) for an

-instrument mastered mixture by introducing a sequential BS-RoFormer pipeline that denoises, separates, and dereverbs the signal, leveraging pretrained MSS checkpoints and targeted finetuning. It designs a cascaded separation scheme starting from a frozen 6-stem model and adding four fine-tuned refinements to reach

stems, paired with a restoration path that denoises all components and dereverbs vocals only. Training employs data cleaning/mixing, random mixture augmentation, and longer context up to

s, with losses combining

and multi-resolution STFT criteria. The approach achieves top performance on MSRBench, with MMSNR of

and FAD of

, and is open-sourced at https://github.com/ModistAndrew/xlance-msr, signaling practical impact for professional remixing and historical audio restoration.

Abstract

Paper Structure (8 sections, 1 figure, 1 table)

This paper contains 8 sections, 1 figure, 1 table.

Introduction
Methodology
BS-RoFormer
Separation model
Restoration model
Training schemes
Evaluation
Conclusion

Figures (1)

Figure 1: System architecture of the sequential BS-RoFormer framework. The input mixture is first processed by a denoise module to remove noise that adversely affects later separation process. The cleaned signal then passes through a frozen pretrained 6-stem separation model, followed by additional fine-tuned separation models for handling certain instruments. Finally, only the vocals stem undergoes dereverb.

The SJTU X-LANCE Lab System for MSR Challenge 2025

TL;DR

Abstract

The SJTU X-LANCE Lab System for MSR Challenge 2025

Authors

TL;DR

Abstract

Table of Contents

Figures (1)