Table of Contents
Fetching ...

CryCeleb: A Speaker Verification Dataset Based on Infant Cry Sounds

David Budaghyan, Charles C. Onu, Arsenii Gorin, Cem Subakan, Doina Precup

TL;DR

The Ubenwa CryCeleb dataset - a labeled collection of infant cries - and the accompanying CryCeleb 2023 task, which is a public speaker verification challenge based on cry sounds, are described, aiming to encourage research in infant cry analysis.

Abstract

This paper describes the Ubenwa CryCeleb dataset - a labeled collection of infant cries - and the accompanying CryCeleb 2023 task, which is a public speaker verification challenge based on cry sounds. We released more than 6 hours of manually segmented cry sounds from 786 newborns for academic use, aiming to encourage research in infant cry analysis. The inaugural public competition attracted 59 participants, 11 of whom improved the baseline performance. The top-performing system achieved a significant improvement scoring 25.8% equal error rate, which is still far from the performance of state-of-the-art adult speaker verification systems. Therefore, we believe there is room for further research on this dataset, potentially extending beyond the verification task.

CryCeleb: A Speaker Verification Dataset Based on Infant Cry Sounds

TL;DR

The Ubenwa CryCeleb dataset - a labeled collection of infant cries - and the accompanying CryCeleb 2023 task, which is a public speaker verification challenge based on cry sounds, are described, aiming to encourage research in infant cry analysis.

Abstract

This paper describes the Ubenwa CryCeleb dataset - a labeled collection of infant cries - and the accompanying CryCeleb 2023 task, which is a public speaker verification challenge based on cry sounds. We released more than 6 hours of manually segmented cry sounds from 786 newborns for academic use, aiming to encourage research in infant cry analysis. The inaugural public competition attracted 59 participants, 11 of whom improved the baseline performance. The top-performing system achieved a significant improvement scoring 25.8% equal error rate, which is still far from the performance of state-of-the-art adult speaker verification systems. Therefore, we believe there is room for further research on this dataset, potentially extending beyond the verification task.
Paper Structure (10 sections, 5 figures, 5 tables)

This paper contains 10 sections, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Histogram of cry sound durations.
  • Figure 2: Number of infants per number of cry sounds.
  • Figure 3: CryCeleb challenge verification task. Given two recordings, predict if they belong to the same infant
  • Figure 4: Verification scores for negative and positive pairs produced by the ECAPA-TDNN pre-trained on speech data VoxCeleb (left) and fine-tuned on CryCeleb (right). Classification threshold (red dashed line) is selected to minimize EER.
  • Figure 5: Equal error rates of CryCelb 2023 competition submissions. The public subset of test set was available throughout the competition while private subset was used only after the competition end to evaluate 5 submissions selected by each participant.