Table of Contents
Fetching ...

The Sound Demixing Challenge 2023 $\unicode{x2013}$ Music Demixing Track

Giorgio Fabbro, Stefan Uhlich, Chieh-Hsin Lai, Woosung Choi, Marco Martínez-Ramírez, Weihsiang Liao, Igor Gadelha, Geraldo Ramos, Eddie Hsu, Hugo Rodrigues, Fabian-Robert Stöter, Alexandre Défossez, Yi Luo, Jianwei Yu, Dipam Chakraborty, Sharada Mohanty, Roman Solovyev, Alexander Stempkovskiy, Tatiana Habruseva, Nabarun Goswami, Tatsuya Harada, Minseok Kim, Jun Hyung Lee, Yuanliang Dong, Xinran Zhang, Jiafeng Liu, Yuki Mitsufuji

TL;DR

The paper introduces the SDX'23 Music Demixing Track, emphasizing robust MSS by formalizing label-noise and bleeding errors and releasing SDXDB23_LabelNoise and SDXDB23_Bleeding to benchmark training-data robustness. It details the challenge setup, three leaderboards, and the SDR-based evaluation, supplemented by a listening test to capture perceptual quality. Across teams, innovative robustness strategies include iterative data refinement, loss-truncation-based training, multi-branch architectures with wavelet/DWT components, and ensemble methods that fuse time-domain, spectral, and residual-model outputs. The study demonstrates meaningful SDR improvements and offers practical insights for organizing future robust MSS challenges, highlighting the trade-offs between data cleaning, model design, and perceptual evaluation. Overall, the SDX'23 track advances robust MSS by providing datasets, benchmarks, and methodologies that enable reliable separation under realistic training-data imperfections.”

Abstract

This paper summarizes the music demixing (MDX) track of the Sound Demixing Challenge (SDX'23). We provide a summary of the challenge setup and introduce the task of robust music source separation (MSS), i.e., training MSS models in the presence of errors in the training data. We propose a formalization of the errors that can occur in the design of a training dataset for MSS systems and introduce two new datasets that simulate such errors: SDXDB23_LabelNoise and SDXDB23_Bleeding. We describe the methods that achieved the highest scores in the competition. Moreover, we present a direct comparison with the previous edition of the challenge (the Music Demixing Challenge 2021): the best performing system achieved an improvement of over 1.6dB in signal-to-distortion ratio over the winner of the previous competition, when evaluated on MDXDB21. Besides relying on the signal-to-distortion ratio as objective metric, we also performed a listening test with renowned producers and musicians to study the perceptual quality of the systems and report here the results. Finally, we provide our insights into the organization of the competition and our prospects for future editions.

The Sound Demixing Challenge 2023 $\unicode{x2013}$ Music Demixing Track

TL;DR

The paper introduces the SDX'23 Music Demixing Track, emphasizing robust MSS by formalizing label-noise and bleeding errors and releasing SDXDB23_LabelNoise and SDXDB23_Bleeding to benchmark training-data robustness. It details the challenge setup, three leaderboards, and the SDR-based evaluation, supplemented by a listening test to capture perceptual quality. Across teams, innovative robustness strategies include iterative data refinement, loss-truncation-based training, multi-branch architectures with wavelet/DWT components, and ensemble methods that fuse time-domain, spectral, and residual-model outputs. The study demonstrates meaningful SDR improvements and offers practical insights for organizing future robust MSS challenges, highlighting the trade-offs between data cleaning, model design, and perceptual evaluation. Overall, the SDX'23 track advances robust MSS by providing datasets, benchmarks, and methodologies that enable reliable separation under realistic training-data imperfections.”

Abstract

This paper summarizes the music demixing (MDX) track of the Sound Demixing Challenge (SDX'23). We provide a summary of the challenge setup and introduce the task of robust music source separation (MSS), i.e., training MSS models in the presence of errors in the training data. We propose a formalization of the errors that can occur in the design of a training dataset for MSS systems and introduce two new datasets that simulate such errors: SDXDB23_LabelNoise and SDXDB23_Bleeding. We describe the methods that achieved the highest scores in the competition. Moreover, we present a direct comparison with the previous edition of the challenge (the Music Demixing Challenge 2021): the best performing system achieved an improvement of over 1.6dB in signal-to-distortion ratio over the winner of the previous competition, when evaluated on MDXDB21. Besides relying on the signal-to-distortion ratio as objective metric, we also performed a listening test with renowned producers and musicians to study the perceptual quality of the systems and report here the results. Finally, we provide our insights into the organization of the competition and our prospects for future editions.
Paper Structure (36 sections, 5 equations, 10 figures, 13 tables)

This paper contains 36 sections, 5 equations, 10 figures, 13 tables.

Figures (10)

  • Figure 1: Comparison of validation loss when training the same model on a small dataset (red), a large dataset with errors (purple) and the same large dataset once the errors have been corrected (green). All experiments were evaluated on the same validation set.
  • Figure 2: Statistics collected during our internal data cleaning activity. The values in the rows are normalized so that they sum to 1. For example, in all the errors we found in our internal data, the chance that a guitar was labeled as bass is 32%.
  • Figure 3: The process of cleaning the stems of one song in the noisy dataset using the proposed robust baseline model. We propose two different methods: filtered and redistributed.
  • Figure 4: Results of the listening test.
  • Figure 5: Results of the listening test by assessor category.
  • ...and 5 more figures