Source Separation for A Cappella Music
Luca A. Lanzendörfer, Constantin Pinkl, Florian Grötschla
TL;DR
The paper tackles multi-singer source separation in a cappella when the number of active singers varies, introducing SepACap, a waveform-domain model derived from SepReformer that uses periodic activations and a silence-aware composite loss. A power-set data augmentation strategy generates $2^n-1$ mixtures per clip, enabling joint separation and detection of active singers and improving generalization to missing singers. On the JaCappella dataset, SepACap achieves state-of-the-art results for full ensembles and exhibits robust subset performance with reduced bleed-through, albeit with some artifact trade-offs. These contributions advance practical applications in transcription, remixing, and analysis of diverse a cappella performances while alleviating data requirements for training multi-singer separation models.
Abstract
In this work, we study the task of multi-singer separation in a cappella music, where the number of active singers varies across mixtures. To address this, we use a power set-based data augmentation strategy that expands limited multi-singer datasets into exponentially more training samples. To separate singers, we introduce SepACap, an adaptation of SepReformer, a state-of-the-art speaker separation model architecture. We adapt the model with periodic activations and a composite loss function that remains effective when stems are silent, enabling robust detection and separation. Experiments on the JaCappella dataset demonstrate that our approach achieves state-of-the-art performance in both full-ensemble and subset singer separation scenarios, outperforming spectrogram-based baselines while generalizing to realistic mixtures with varying numbers of singers.
