Table of Contents
Fetching ...

Music Source Restoration

Yongyi Zang, Zheqi Dai, Mark D. Plumbley, Qiuqiang Kong

TL;DR

Music Source Restoration (MSR) addresses the gap between idealized source separation and real-world music production by modeling mixtures as degraded sums of sources and aiming to recover undegraded originals. The authors introduce RawStems, a large-scale dataset of unprocessed stems with hierarchical instrument annotations, and propose U-Former as a baseline to validate MSR feasibility. They define a degradation set capturing common production effects and evaluate using SI-SDR and SSIM, highlighting the intrinsic difficulty of MSR and the need for dedicated methods and evaluation frameworks. The work provides public data, code, and models to catalyze research toward practical, production-ready music-stem restoration and remixing tools.

Abstract

We introduce Music Source Restoration (MSR), a novel task addressing the gap between idealized source separation and real-world music production. Current Music Source Separation (MSS) approaches assume mixtures are simple sums of sources, ignoring signal degradations employed during music production like equalization, compression, and reverb. MSR models mixtures as degraded sums of individually degraded sources, with the goal of recovering original, undegraded signals. Due to the lack of data for MSR, we present RawStems, a dataset annotation of 578 songs with unprocessed source signals organized into 8 primary and 17 secondary instrument groups, totaling 354.13 hours. To the best of our knowledge, RawStems is the first dataset that contains unprocessed music stems with hierarchical categories. We consider spectral filtering, dynamic range compression, harmonic distortion, reverb and lossy codec as possible degradations, and establish U-Former as a baseline method, demonstrating the feasibility of MSR on our dataset. We release the RawStems dataset annotations, degradation simulation pipeline, training code and pre-trained models to be publicly available.

Music Source Restoration

TL;DR

Music Source Restoration (MSR) addresses the gap between idealized source separation and real-world music production by modeling mixtures as degraded sums of sources and aiming to recover undegraded originals. The authors introduce RawStems, a large-scale dataset of unprocessed stems with hierarchical instrument annotations, and propose U-Former as a baseline to validate MSR feasibility. They define a degradation set capturing common production effects and evaluate using SI-SDR and SSIM, highlighting the intrinsic difficulty of MSR and the need for dedicated methods and evaluation frameworks. The work provides public data, code, and models to catalyze research toward practical, production-ready music-stem restoration and remixing tools.

Abstract

We introduce Music Source Restoration (MSR), a novel task addressing the gap between idealized source separation and real-world music production. Current Music Source Separation (MSS) approaches assume mixtures are simple sums of sources, ignoring signal degradations employed during music production like equalization, compression, and reverb. MSR models mixtures as degraded sums of individually degraded sources, with the goal of recovering original, undegraded signals. Due to the lack of data for MSR, we present RawStems, a dataset annotation of 578 songs with unprocessed source signals organized into 8 primary and 17 secondary instrument groups, totaling 354.13 hours. To the best of our knowledge, RawStems is the first dataset that contains unprocessed music stems with hierarchical categories. We consider spectral filtering, dynamic range compression, harmonic distortion, reverb and lossy codec as possible degradations, and establish U-Former as a baseline method, demonstrating the feasibility of MSR on our dataset. We release the RawStems dataset annotations, degradation simulation pipeline, training code and pre-trained models to be publicly available.

Paper Structure

This paper contains 16 sections, 4 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: Analysis on the RawStems dataset at first and second stem level. Stems distribution reported with active playing times.
  • Figure 2: t-SNE visualization for mixture-level CLAP embeddings of RawStems, MUSDB18-HQ and MoisesDB. Best viewed in color.
  • Figure 3: t-SNE visualization for track-level CLAP embeddings of all individual audio files within RawStems. Best viewed in color.
  • Figure 4: Bayesian analysis for co-occurrence of instrument stems.
  • Figure 5: Model architecture of U-Former.