JaCappella Corpus: A Japanese a Cappella Vocal Ensemble Corpus
Tomohiko Nakamura, Shinnosuke Takamichi, Naoko Tanji, Satoru Fukayama, Hiroshi Saruwatari
TL;DR
The paper introduces jaCappella, a Japanese a cappella vocal ensemble corpus designed for vocal ensemble separation and synthesis. It comprises 35 songs arranged from out-of-copyright children's tunes, featuring six voice parts (Vo, S, A, T, Bs, VP) across seven genre-based subsets and provided as monaural 24-bit WAVs with MusicXML scores. The authors analyze lexical versus non-lexical syllables and demonstrate the corpus as a challenging resource for separation by evaluating X-UMX, DPTNet, and MRDLA in a singer-closed setting using $SI$-$SDR$ as the evaluation metric, with waveform-based methods performing best. The work supplies a valuable dataset that enables both separation and synthesis research for varied vocal ensemble styles and genres in MIR.
Abstract
We construct a corpus of Japanese a cappella vocal ensembles (jaCappella corpus) for vocal ensemble separation and synthesis. It consists of 35 copyright-cleared vocal ensemble songs and their audio recordings of individual voice parts. These songs were arranged from out-of-copyright Japanese children's songs and have six voice parts (lead vocal, soprano, alto, tenor, bass, and vocal percussion). They are divided into seven subsets, each of which features typical characteristics of a music genre such as jazz and enka. The variety in genre and voice part match vocal ensembles recently widespread in social media services such as YouTube, although the main targets of conventional vocal ensemble datasets are choral singing made up of soprano, alto, tenor, and bass. Experimental evaluation demonstrates that our corpus is a challenging resource for vocal ensemble separation. Our corpus is available on our project page (https://tomohikonakamura.github.io/jaCappella_corpus/).
