Table of Contents
Fetching ...

FruitsMusic: A Real-World Corpus of Japanese Idol-Group Songs

Hitoshi Suda, Shunsuke Yoshida, Tomohiko Nakamura, Satoru Fukayama, Jun Ogata

TL;DR

FruitsMusic tackles the gap in practical MIR resources by providing a real-world, YouTube-derived corpus of Japanese idol-group songs with precise, per-segment singer annotations. The authors construct 163 minutes across 40 songs and 18 groups, encoded in JSON/RTTM/lyrics formats and organized into trainable Subset A and evaluative Subset B, with careful song selection and licensing rules. They demonstrate the dataset’s utility by evaluating singer embeddings and diarization under real-world conditions, including a synthesized multi-singer dataset to stress-test methods. Results show that while embeddings and diarization benefit from FruitsMusic and vocal separation, accurately distinguishing multiple short solo segments remains challenging, illustrating a clear avenue for future improvements and the practical value of this resource for MIR and fan-centric applications.

Abstract

This study presents FruitsMusic, a metadata corpus of Japanese idol-group songs in the real world, precisely annotated with who sings what and when. Japanese idol-group songs, vital to Japanese pop culture, feature a unique vocal arrangement style, where songs are divided into several segments, and a specific individual or multiple singers are assigned to each segment. To enhance singer diarization methods for recognizing such structures, we constructed FruitsMusic as a resource using 40 music videos of Japanese idol groups from YouTube. The corpus includes detailed annotations, covering songs across various genres, division and assignment styles, and groups ranging from 4 to 9 members. FruitsMusic also facilitates the development of various music information retrieval techniques, such as lyrics transcription and singer identification, benefiting not only Japanese idol-group songs but also a wide range of songs featuring single or multiple singers from various cultures. This paper offers a comprehensive overview of FruitsMusic, including its creation methodology and unique characteristics compared to conversational speech. Additionally, this paper evaluates the efficacy of current methods for singer embedding extraction and diarization in challenging real-world conditions using FruitsMusic. Furthermore, this paper examines potential improvements in automatic diarization performance through evaluating human performance.

FruitsMusic: A Real-World Corpus of Japanese Idol-Group Songs

TL;DR

FruitsMusic tackles the gap in practical MIR resources by providing a real-world, YouTube-derived corpus of Japanese idol-group songs with precise, per-segment singer annotations. The authors construct 163 minutes across 40 songs and 18 groups, encoded in JSON/RTTM/lyrics formats and organized into trainable Subset A and evaluative Subset B, with careful song selection and licensing rules. They demonstrate the dataset’s utility by evaluating singer embeddings and diarization under real-world conditions, including a synthesized multi-singer dataset to stress-test methods. Results show that while embeddings and diarization benefit from FruitsMusic and vocal separation, accurately distinguishing multiple short solo segments remains challenging, illustrating a clear avenue for future improvements and the practical value of this resource for MIR and fan-centric applications.

Abstract

This study presents FruitsMusic, a metadata corpus of Japanese idol-group songs in the real world, precisely annotated with who sings what and when. Japanese idol-group songs, vital to Japanese pop culture, feature a unique vocal arrangement style, where songs are divided into several segments, and a specific individual or multiple singers are assigned to each segment. To enhance singer diarization methods for recognizing such structures, we constructed FruitsMusic as a resource using 40 music videos of Japanese idol groups from YouTube. The corpus includes detailed annotations, covering songs across various genres, division and assignment styles, and groups ranging from 4 to 9 members. FruitsMusic also facilitates the development of various music information retrieval techniques, such as lyrics transcription and singer identification, benefiting not only Japanese idol-group songs but also a wide range of songs featuring single or multiple singers from various cultures. This paper offers a comprehensive overview of FruitsMusic, including its creation methodology and unique characteristics compared to conversational speech. Additionally, this paper evaluates the efficacy of current methods for singer embedding extraction and diarization in challenging real-world conditions using FruitsMusic. Furthermore, this paper examines potential improvements in automatic diarization performance through evaluating human performance.
Paper Structure (23 sections, 2 equations, 5 figures, 5 tables)

This paper contains 23 sections, 2 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: x-vector, mixed (0.32)
  • Figure 2: x-vector, separated (0.40)
  • Figure 3: ECAPA-TDNN, mixed (0.59)
  • Figure 4: ECAPA-TDNN, separated (0.64)
  • Figure :