Table of Contents
Fetching ...

JVS-MuSiC: Japanese multispeaker singing-voice corpus

Hiroki Tamaru, Shinnosuke Takamichi, Naoko Tanji, Hiroshi Saruwatari

TL;DR

This paper presents JVS-MuSiC, an open multispeaker Japanese singing-voice corpus comprising 100 singers performing Katatsumuri and a singer-specific second song, designed to enable cross-voice analysis and synthesis. The authors modify voices with Melodyne to align key and tempo within and across groups, and create similarity and oneness matrices to quantify perceptual relations, evaluating correlations between singing-voice similarity, unison oneness, and speech similarity. Results show a positive, moderate correlation between singing-voice similarity and unison oneness ($r = 0.45$, $p = 1.4\times 10^{-39}$) and a weak correlation with speech similarity ($r = 0.17$, $p = 1.9\times 10^{-6}$), indicating distinct perceptual representations for singing versus speech. The dataset, licensed CC BY-SA 4.0 and freely available for non-commercial research, provides a valuable resource for multispeaker singing-voice analysis and synthesis and unison voice research in Japanese.

Abstract

Thanks to developments in machine learning techniques, it has become possible to synthesize high-quality singing voices of a single singer. An open multispeaker singing-voice corpus would further accelerate the research in singing-voice synthesis. However, conventional singing-voice corpora only consist of the singing voices of a single singer. We designed a Japanese multispeaker singing-voice corpus called "JVS-MuSiC" with the aim to analyze and synthesize a variety of voices. The corpus consists of 100 singers' recordings of the same song, Katatsumuri, which is a Japanese children's song. It also includes another song that is different for each singer. In this paper, we describe the design of the corpus and experimental analyses using JVS-MuSiC. We investigated the relationship between 1) the similarity of singing voices and perceptual oneness of unison singing voices and between 2) the similarity of singing voices and that of speech. The results suggest that 1) there is a positive and moderate correlation between singing-voice similarity and the oneness of unison and that 2) the correlation between singing-voice similarity and speech similarity is weak. This corpus is freely available online.

JVS-MuSiC: Japanese multispeaker singing-voice corpus

TL;DR

This paper presents JVS-MuSiC, an open multispeaker Japanese singing-voice corpus comprising 100 singers performing Katatsumuri and a singer-specific second song, designed to enable cross-voice analysis and synthesis. The authors modify voices with Melodyne to align key and tempo within and across groups, and create similarity and oneness matrices to quantify perceptual relations, evaluating correlations between singing-voice similarity, unison oneness, and speech similarity. Results show a positive, moderate correlation between singing-voice similarity and unison oneness (, ) and a weak correlation with speech similarity (, ), indicating distinct perceptual representations for singing versus speech. The dataset, licensed CC BY-SA 4.0 and freely available for non-commercial research, provides a valuable resource for multispeaker singing-voice analysis and synthesis and unison voice research in Japanese.

Abstract

Thanks to developments in machine learning techniques, it has become possible to synthesize high-quality singing voices of a single singer. An open multispeaker singing-voice corpus would further accelerate the research in singing-voice synthesis. However, conventional singing-voice corpora only consist of the singing voices of a single singer. We designed a Japanese multispeaker singing-voice corpus called "JVS-MuSiC" with the aim to analyze and synthesize a variety of voices. The corpus consists of 100 singers' recordings of the same song, Katatsumuri, which is a Japanese children's song. It also includes another song that is different for each singer. In this paper, we describe the design of the corpus and experimental analyses using JVS-MuSiC. We investigated the relationship between 1) the similarity of singing voices and perceptual oneness of unison singing voices and between 2) the similarity of singing voices and that of speech. The results suggest that 1) there is a positive and moderate correlation between singing-voice similarity and the oneness of unison and that 2) the correlation between singing-voice similarity and speech similarity is weak. This corpus is freely available online.

Paper Structure

This paper contains 14 sections, 3 figures, 1 table.

Figures (3)

  • Figure 1: Scatter plot of key and tempo.
  • Figure 2: Scatter plot of average similarity and oneness of unison. Correlation coefficient is 0.45 and $p$-value is $1.4\times 10^{-39}$.
  • Figure 3: Scatter plot of average similarity of singing voice and speech. Correlation coefficient is 0.17 and $p$-value is $1.9\times 10^{-6}$.