Hybrid Disagreement-Diversity Active Learning for Bioacoustic Sound Event Detection
Shiqi Zhang, Tuomas Virtanen
TL;DR
The paper addresses the Bioacoustic Sound Event Detection challenge under limited labeled data and severe class imbalance by adapting the mismatch-first farthest-traversal (MFFT) active-learning strategy, a hybrid approach combining committee disagreement and diversity sampling. It evaluates MFFT on a refined DCASE 2024 Task 5 dataset designed to stress rare-species and unseen-species scenarios, using a frozen PANNs encoder with an MLP head for prediction. Results show that MFFT achieves 68% mAP in cold-start and 71% in warm-start while using only 2.3% of the annotations, approaching the fully supervised performance of 75%, and demonstrates strong rare-species detection. The work highlights a practical, annotation-efficient route for scalable biodiversity monitoring with implications for deploying AL in bioacoustic pipelines and guiding future scalability improvements.
Abstract
Bioacoustic sound event detection (BioSED) is crucial for biodiversity conservation but faces practical challenges during model development and training: limited amounts of annotated data, sparse events, species diversity, and class imbalance. To address these challenges efficiently with a limited labeling budget, we apply the mismatch-first farthest-traversal (MFFT), an active learning method integrating committee voting disagreement and diversity analysis. We also refine an existing BioSED dataset specifically for evaluating active learning algorithms. Experimental results demonstrate that MFFT achieves a mAP of 68% when cold-starting and 71% when warm-starting (which is close to the fully-supervised mAP of 75%) while using only 2.3% of the annotations. Notably, MFFT excels in cold-start scenarios and with rare species, which are critical for monitoring endangered species, demonstrating its practical value.
