Table of Contents
Fetching ...

Hierarchical Label Propagation: A Model-Size-Dependent Performance Booster for AudioSet Tagging

Ludovic Tuncay, Etienne Labbé, Thomas Pellegrini

TL;DR

This work tackles hierarchical label inconsistency in AudioSet by introducing Hierarchical Label Propagation (HLP), which propagates positive labels up the ontology to enforce consistency. The method is flexible as either pre-processing or post-processing with a single-parent update $s_p = \max(s_p, s_c)$ when applicable. Across four model architectures and two datasets, HLP yields clear gains for smaller models (e.g., CNN6, ConvNeXt-femto) and modest gains for larger models (e.g., PaSST-B), with cross-dataset validation on FSD50K confirming transferability. The findings highlight a size-dependent benefit of HLP, offer a practical approach to improve label quality and tagging performance, and include public release of the implementation.

Abstract

AudioSet is one of the most used and largest datasets in audio tagging, containing about 2 million audio samples that are manually labeled with 527 event categories organized into an ontology. However, the annotations contain inconsistencies, particularly where categories that should be labeled as positive according to the ontology are frequently mislabeled as negative. To address this issue, we apply Hierarchical Label Propagation (HLP), which propagates labels up the ontology hierarchy, resulting in a mean increase in positive labels per audio clip from 1.98 to 2.39 and affecting 109 out of the 527 classes. Our results demonstrate that HLP provides performance benefits across various model architectures, including convolutional neural networks (PANN's CNN6 and ConvNeXT) and transformers (PaSST), with smaller models showing more improvements. Finally, on FSD50K, another widely used dataset, models trained on AudioSet with HLP consistently outperformed those trained without HLP. Our source code will be made available on GitHub.

Hierarchical Label Propagation: A Model-Size-Dependent Performance Booster for AudioSet Tagging

TL;DR

This work tackles hierarchical label inconsistency in AudioSet by introducing Hierarchical Label Propagation (HLP), which propagates positive labels up the ontology to enforce consistency. The method is flexible as either pre-processing or post-processing with a single-parent update when applicable. Across four model architectures and two datasets, HLP yields clear gains for smaller models (e.g., CNN6, ConvNeXt-femto) and modest gains for larger models (e.g., PaSST-B), with cross-dataset validation on FSD50K confirming transferability. The findings highlight a size-dependent benefit of HLP, offer a practical approach to improve label quality and tagging performance, and include public release of the implementation.

Abstract

AudioSet is one of the most used and largest datasets in audio tagging, containing about 2 million audio samples that are manually labeled with 527 event categories organized into an ontology. However, the annotations contain inconsistencies, particularly where categories that should be labeled as positive according to the ontology are frequently mislabeled as negative. To address this issue, we apply Hierarchical Label Propagation (HLP), which propagates labels up the ontology hierarchy, resulting in a mean increase in positive labels per audio clip from 1.98 to 2.39 and affecting 109 out of the 527 classes. Our results demonstrate that HLP provides performance benefits across various model architectures, including convolutional neural networks (PANN's CNN6 and ConvNeXT) and transformers (PaSST), with smaller models showing more improvements. Finally, on FSD50K, another widely used dataset, models trained on AudioSet with HLP consistently outperformed those trained without HLP. Our source code will be made available on GitHub.

Paper Structure

This paper contains 10 sections, 1 figure, 3 tables.

Figures (1)

  • Figure 1: Illustration of HLP in a taxonomic ontology. The left side represents the initial state with Domestic animal and Growling as positive labels (green nodes). The right side depicts the state after applying HLP, where the positive label is propagated upwards to Animal (green dashed arrow). The Cat and Dog nodes demonstrate potential ambiguous propagation (red nodes and dashed arrows), illustrating a challenge in HLP where the presence of Growling does not uniquely determine its source between sibling nodes. In our method, these ambiguous labels are kept as negatives.