Table of Contents
Fetching ...

SALT: Standardized Audio event Label Taxonomy

Paraskevas Stamatiadis, Michel Olvera, Slim Essid

TL;DR

The paper addresses fragmentation in audio labeling caused by disparate taxonomies across datasets. It introduces SALT, a standardized audio event label taxonomy built on AudioSet and extended to 734 labels across 24 datasets, with a Python package (py-salt) for navigation, searching, and cross-dataset aggregation. Key contributions include a formal label standardization scheme, hierarchical structure with new categories Water and Other, and tooling for mapping, exploration, and visualization to enable data aggregation and cross-dataset benchmarking. This approach facilitates scalable, reproducible machine listening research by reducing labeling inconsistencies and enabling unified evaluation across diverse datasets.

Abstract

Machine listening systems often rely on fixed taxonomies to organize and label audio data, key for training and evaluating deep neural networks (DNNs) and other supervised algorithms. However, such taxonomies face significant constraints: they are composed of application-dependent predefined categories, which hinders the integration of new or varied sounds, and exhibits limited cross-dataset compatibility due to inconsistent labeling standards. To overcome these limitations, we introduce SALT: Standardized Audio event Label Taxonomy. Building upon the hierarchical structure of AudioSet's ontology, our taxonomy extends and standardizes labels across 24 publicly available environmental sound datasets, allowing the mapping of class labels from diverse datasets to a unified system. Our proposal comes with a new Python package designed for navigating and utilizing this taxonomy, easing cross-dataset label searching and hierarchical exploration. Notably, our package allows effortless data aggregation from diverse sources, hence easy experimentation with combined datasets.

SALT: Standardized Audio event Label Taxonomy

TL;DR

The paper addresses fragmentation in audio labeling caused by disparate taxonomies across datasets. It introduces SALT, a standardized audio event label taxonomy built on AudioSet and extended to 734 labels across 24 datasets, with a Python package (py-salt) for navigation, searching, and cross-dataset aggregation. Key contributions include a formal label standardization scheme, hierarchical structure with new categories Water and Other, and tooling for mapping, exploration, and visualization to enable data aggregation and cross-dataset benchmarking. This approach facilitates scalable, reproducible machine listening research by reducing labeling inconsistencies and enabling unified evaluation across diverse datasets.

Abstract

Machine listening systems often rely on fixed taxonomies to organize and label audio data, key for training and evaluating deep neural networks (DNNs) and other supervised algorithms. However, such taxonomies face significant constraints: they are composed of application-dependent predefined categories, which hinders the integration of new or varied sounds, and exhibits limited cross-dataset compatibility due to inconsistent labeling standards. To overcome these limitations, we introduce SALT: Standardized Audio event Label Taxonomy. Building upon the hierarchical structure of AudioSet's ontology, our taxonomy extends and standardizes labels across 24 publicly available environmental sound datasets, allowing the mapping of class labels from diverse datasets to a unified system. Our proposal comes with a new Python package designed for navigating and utilizing this taxonomy, easing cross-dataset label searching and hierarchical exploration. Notably, our package allows effortless data aggregation from diverse sources, hence easy experimentation with combined datasets.
Paper Structure (9 sections, 5 figures)

This paper contains 9 sections, 5 figures.

Figures (5)

  • Figure 1: Illustration of SALT's standardization process. Dataset labels are systematically mapped to a standard label that ensures cross-dataset compatibility.
  • Figure 2: Contribution of dataset's original (default) labels to SALT after the standardization process.
  • Figure 3: Example of standard label mapping for the standardized label bird.
  • Figure 4: Example of dataset label mapping for the standardized label car_horn.
  • Figure 5: The benefit of label aggregation in selected standardized labels targeting domestic sound events.