Table of Contents
Fetching ...

ICSD: An Open-source Dataset for Infant Cry and Snoring Detection

Qingyu Liu, Longfei Song, Dongxing Xu, Yanhua Long

TL;DR

The paper addresses the lack of dedicated datasets for infant cry and snoring detection by introducing the open ICSD dataset, which combines weakly labeled, real strongly labeled, and synthetic strongly labeled data. It presents three baseline systems—MT-CRNN, CRNN+BEATs, and Competitive CRNN+BEATs—built on a unified data format and evaluated with robust metrics (Inter-F1, Seg-F1, and FPE per hour) across real and synthetic test sets. The ICSD toolkit includes a Scaper-based synthetic data pipeline, a clear data release ontology, and extensive ablation analyses showing the value of mixing synthetic and weakly labeled data with limited real strongly labeled data. The work provides a practical, open benchmark for future ICSD research with insights into model design, data utility, and deployment considerations in home environments.

Abstract

The detection and analysis of infant cry and snoring events are crucial tasks within the field of audio signal processing. While existing datasets for general sound event detection are plentiful, they often fall short in providing sufficient, strongly labeled data specific to infant cries and snoring. To provide a benchmark dataset and thus foster the research of infant cry and snoring detection, this paper introduces the Infant Cry and Snoring Detection (ICSD) dataset, a novel, publicly available dataset specially designed for ICSD tasks. The ICSD comprises three types of subsets: a real strongly labeled subset with event-based labels annotated manually, a weakly labeled subset with only clip-level event annotations, and a synthetic subset generated and labeled with strong annotations. This paper provides a detailed description of the ICSD creation process, including the challenges encountered and the solutions adopted. We offer a comprehensive characterization of the dataset, discussing its limitations and key factors for ICSD usage. Additionally, we conduct extensive experiments on the ICSD dataset to establish baseline systems and offer insights into the main factors when using this dataset for ICSD research. Our goal is to develop a dataset that will be widely adopted by the community as a new open benchmark for future ICSD research.

ICSD: An Open-source Dataset for Infant Cry and Snoring Detection

TL;DR

The paper addresses the lack of dedicated datasets for infant cry and snoring detection by introducing the open ICSD dataset, which combines weakly labeled, real strongly labeled, and synthetic strongly labeled data. It presents three baseline systems—MT-CRNN, CRNN+BEATs, and Competitive CRNN+BEATs—built on a unified data format and evaluated with robust metrics (Inter-F1, Seg-F1, and FPE per hour) across real and synthetic test sets. The ICSD toolkit includes a Scaper-based synthetic data pipeline, a clear data release ontology, and extensive ablation analyses showing the value of mixing synthetic and weakly labeled data with limited real strongly labeled data. The work provides a practical, open benchmark for future ICSD research with insights into model design, data utility, and deployment considerations in home environments.

Abstract

The detection and analysis of infant cry and snoring events are crucial tasks within the field of audio signal processing. While existing datasets for general sound event detection are plentiful, they often fall short in providing sufficient, strongly labeled data specific to infant cries and snoring. To provide a benchmark dataset and thus foster the research of infant cry and snoring detection, this paper introduces the Infant Cry and Snoring Detection (ICSD) dataset, a novel, publicly available dataset specially designed for ICSD tasks. The ICSD comprises three types of subsets: a real strongly labeled subset with event-based labels annotated manually, a weakly labeled subset with only clip-level event annotations, and a synthetic subset generated and labeled with strong annotations. This paper provides a detailed description of the ICSD creation process, including the challenges encountered and the solutions adopted. We offer a comprehensive characterization of the dataset, discussing its limitations and key factors for ICSD usage. Additionally, we conduct extensive experiments on the ICSD dataset to establish baseline systems and offer insights into the main factors when using this dataset for ICSD research. Our goal is to develop a dataset that will be widely adopted by the community as a new open benchmark for future ICSD research.
Paper Structure (22 sections, 3 equations, 6 figures, 10 tables)

This paper contains 22 sections, 3 equations, 6 figures, 10 tables.

Figures (6)

  • Figure 1: Duration histogram of events and time intervals.
  • Figure 2: Spectrogram samples of 12 types of background sounds.
  • Figure 3: Illustration of SSL data synthesizing.
  • Figure 4: The structure of the ICSD dataset ontology.
  • Figure 5: Training procedure illustration of the Competitive CRNN+BEATs model.
  • ...and 1 more figures