ACMID: Automatic Curation of Musical Instrument Dataset for 7-Stem Music Source Separation
Ji Yu, Yang shuo, Xu Yuetonghui, Liu Mengmei, Ji Qiang, Han Zerui
TL;DR
This work tackles data scarcity and metadata mismatch in music source separation by introducing ACMID, a large-scale dataset crawled from YouTube and refined with automatic, per-instrument binary classifiers built on a frozen pretrained audio encoder. The authors define a seven-stem instrument taxonomy, implement multilingual web crawling, and perform rigorous data cleaning to yield ACMID-Cleaned, which enables higher-granularity MSS training. Empirical results show that cleaning ACMID substantially improves MSS performance (e.g., an average SDR gain of $1.16$ dB when adding ACMID-Cleaned to MoisesDB+MedleyDB) and that the cleaned dataset enhances both standalone MSS training and data augmentation. The work provides open-source crawling and cleaning tools, facilitating reproducible, scalable development of high-granularity MSS systems.
Abstract
Most current music source separation (MSS) methods rely on supervised learning, limited by training data quantity and quality. Though web-crawling can bring abundant data, platform-level track labeling often causes metadata mismatches, impeding accurate "audio-label" pair acquisition. To address this, we present ACMID: a dataset for MSS generated through web crawling of extensive raw data, followed by automatic cleaning via an instrument classifier built on a pre-trained audio encoder that filters and aggregates clean segments of target instruments from the crawled tracks, resulting in the refined ACMID-Cleaned dataset. Leveraging abundant data, we expand the conventional classification from 4-stem (Vocal/Bass/Drums/Others) to 7-stem (Piano/Drums/Bass/Acoustic Guitar/Electric Guitar/Strings/Wind-Brass), enabling high granularity MSS systems. Experiments on SOTA MSS model demonstrates two key results: (i) MSS model trained with ACMID-Cleaned achieved a 2.39dB improvement in SDR performance compared to that with ACMID-Uncleaned, demostrating the effectiveness of our data cleaning procedure; (ii) incorporating ACMID-Cleaned to training enhances MSS model's average performance by 1.16dB, confirming the value of our dataset. Our data crawling code, cleaning model code and weights are available at: https://github.com/scottishfold0621/ACMID.
