SALT: Standardized Audio event Label Taxonomy
Paraskevas Stamatiadis, Michel Olvera, Slim Essid
TL;DR
The paper addresses fragmentation in audio labeling caused by disparate taxonomies across datasets. It introduces SALT, a standardized audio event label taxonomy built on AudioSet and extended to 734 labels across 24 datasets, with a Python package (py-salt) for navigation, searching, and cross-dataset aggregation. Key contributions include a formal label standardization scheme, hierarchical structure with new categories Water and Other, and tooling for mapping, exploration, and visualization to enable data aggregation and cross-dataset benchmarking. This approach facilitates scalable, reproducible machine listening research by reducing labeling inconsistencies and enabling unified evaluation across diverse datasets.
Abstract
Machine listening systems often rely on fixed taxonomies to organize and label audio data, key for training and evaluating deep neural networks (DNNs) and other supervised algorithms. However, such taxonomies face significant constraints: they are composed of application-dependent predefined categories, which hinders the integration of new or varied sounds, and exhibits limited cross-dataset compatibility due to inconsistent labeling standards. To overcome these limitations, we introduce SALT: Standardized Audio event Label Taxonomy. Building upon the hierarchical structure of AudioSet's ontology, our taxonomy extends and standardizes labels across 24 publicly available environmental sound datasets, allowing the mapping of class labels from diverse datasets to a unified system. Our proposal comes with a new Python package designed for navigating and utilizing this taxonomy, easing cross-dataset label searching and hierarchical exploration. Notably, our package allows effortless data aggregation from diverse sources, hence easy experimentation with combined datasets.
