Aggregation Strategies for Efficient Annotation of Bioacoustic Sound Events Using Active Learning

Richard Lindholm; Oscar Marklund; Olof Mogren; John Martinsson

Aggregation Strategies for Efficient Annotation of Bioacoustic Sound Events Using Active Learning

Richard Lindholm, Oscar Marklund, Olof Mogren, John Martinsson

TL;DR

This work tackles the annotation bottleneck in bioacoustic SED by introducing Top $K$ Entropy, an uncertainty-aggregation method that prioritizes the most uncertain segments within a file and translates them into file-level queries for active learning. Using a pool-based loop with a segment-based SED leveraging $1024$-D YAMNet features, the approach achieves near-fully supervised performance while requiring only about $8\%$ of labels, yielding up to a $92\%$ reduction in annotation effort. The method proves robust to noise variability, generalizes across different event ratios, and demonstrates potential for scalable bioacoustic monitoring in diverse ecological settings. These results highlight the practical value of uncertainty-focused querying for time-series annotation tasks beyond bioacoustics.

Abstract

The vast amounts of audio data collected in Sound Event Detection (SED) applications require efficient annotation strategies to enable supervised learning. Manual labeling is expensive and time-consuming, making Active Learning (AL) a promising approach for reducing annotation effort. We introduce Top K Entropy, a novel uncertainty aggregation strategy for AL that prioritizes the most uncertain segments within an audio recording, instead of averaging uncertainty across all segments. This approach enables the selection of entire recordings for annotation, improving efficiency in sparse data scenarios. We compare Top K Entropy to random sampling and Mean Entropy, and show that fewer labels can lead to the same model performance, particularly in datasets with sparse sound events. Evaluations are conducted on audio mixtures of sound recordings from parks with meerkat, dog, and baby crying sound events, representing real-world bioacoustic monitoring scenarios. Using Top K Entropy for active learning, we can achieve comparable performance to training on the fully labeled dataset with only 8% of the labels. Top K Entropy outperforms Mean Entropy, suggesting that it is best to let the most uncertain segments represent the uncertainty of an audio file. The findings highlight the potential of AL for scalable annotation in audio and time-series applications, including bioacoustics.

Aggregation Strategies for Efficient Annotation of Bioacoustic Sound Events Using Active Learning

TL;DR

Abstract

Aggregation Strategies for Efficient Annotation of Bioacoustic Sound Events Using Active Learning

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)