Table of Contents
Fetching ...

The Spheres Dataset: Multitrack Orchestral Recordings for Music Source Separation and Information Retrieval

Jaime Garcia-Martinez, David Diaz-Guerra, John Anderson, Ricardo Falcon-Perez, Pablo Cabañas-Molero, Tuomas Virtanen, Julio J. Carabias-Orti, Pedro Vera-Candeas

TL;DR

The Spheres dataset addresses the scarcity of publicly available multitrack orchestral data for music source separation in classical music by providing isolated stems for each instrument across main, ambient, and close mics, recorded in a controlled studio with measured room acoustics. The authors propose a careful recording and data-structuring workflow, including RIR estimates from instrument positions and a mixture framework that preserves realistic bleed, enabling both separation and dereverberation research. Baseline experiments using X-UMX demonstrate potential gains in instrument-family separation and close-mic debleeding while highlighting generalization challenges across recording setups. By releasing all materials, scripts, and RIRs, The Spheres establishes a valuable benchmark for evaluating MSS, localization, and immersive rendering in realistic classical-music scenarios, bridging synthetic data and real-world orchestral applications.

Abstract

This paper introduces The Spheres dataset, multitrack orchestral recordings designed to advance machine learning research in music source separation and related MIR tasks within the classical music domain. The dataset is composed of over one hour recordings of musical pieces performed by the Colibrì Ensemble at The Spheres recording studio, capturing two canonical works - Tchaikovsky's Romeo and Juliet and Mozart's Symphony No. 40 - along with chromatic scales and solo excerpts for each instrument. The recording setup employed 23 microphones, including close spot, main, and ambient microphones, enabling the creation of realistic stereo mixes with controlled bleeding and providing isolated stems for supervised training of source separation models. In addition, room impulse responses were estimated for each instrument position, offering valuable acoustic characterization of the recording space. We present the dataset structure, acoustic analysis, and baseline evaluations using X-UMX based models for orchestral family separation and microphone debleeding. Results highlight both the potential and the challenges of source separation in complex orchestral scenarios, underscoring the dataset's value for benchmarking and for exploring new approaches to separation, localization, dereverberation, and immersive rendering of classical music.

The Spheres Dataset: Multitrack Orchestral Recordings for Music Source Separation and Information Retrieval

TL;DR

The Spheres dataset addresses the scarcity of publicly available multitrack orchestral data for music source separation in classical music by providing isolated stems for each instrument across main, ambient, and close mics, recorded in a controlled studio with measured room acoustics. The authors propose a careful recording and data-structuring workflow, including RIR estimates from instrument positions and a mixture framework that preserves realistic bleed, enabling both separation and dereverberation research. Baseline experiments using X-UMX demonstrate potential gains in instrument-family separation and close-mic debleeding while highlighting generalization challenges across recording setups. By releasing all materials, scripts, and RIRs, The Spheres establishes a valuable benchmark for evaluating MSS, localization, and immersive rendering in realistic classical-music scenarios, bridging synthetic data and real-world orchestral applications.

Abstract

This paper introduces The Spheres dataset, multitrack orchestral recordings designed to advance machine learning research in music source separation and related MIR tasks within the classical music domain. The dataset is composed of over one hour recordings of musical pieces performed by the Colibrì Ensemble at The Spheres recording studio, capturing two canonical works - Tchaikovsky's Romeo and Juliet and Mozart's Symphony No. 40 - along with chromatic scales and solo excerpts for each instrument. The recording setup employed 23 microphones, including close spot, main, and ambient microphones, enabling the creation of realistic stereo mixes with controlled bleeding and providing isolated stems for supervised training of source separation models. In addition, room impulse responses were estimated for each instrument position, offering valuable acoustic characterization of the recording space. We present the dataset structure, acoustic analysis, and baseline evaluations using X-UMX based models for orchestral family separation and microphone debleeding. Results highlight both the potential and the challenges of source separation in complex orchestral scenarios, underscoring the dataset's value for benchmarking and for exploring new approaches to separation, localization, dereverberation, and immersive rendering of classical music.

Paper Structure

This paper contains 14 sections, 5 equations, 12 figures, 5 tables.

Figures (12)

  • Figure 1: A photo of the studio used for the recordings. Each musician was allocated a seat in the studio, which placement was following the typical placements when playing orchestral music.
  • Figure 2: The approximate placement of instruments and microphones (indicated by M#) in the recording room. Each rounded square indicates the seat of a musician.
  • Figure 3: A photograph illustrating a session of recording one bassoon line.
  • Figure 4: Estimated RIR for the Violin 2 microphone with the source located at the Violin 2 position. The plot shows the RIR in dB as a function of sample index, including annotations for the microphone index (7) and name (Vln2).
  • Figure 5: Time played by every instrument in The Spheres dataset.
  • ...and 7 more figures