Automatic acoustic detection of birds through deep learning: the first Bird Audio Detection challenge
Dan Stowell, Yannis Stylianou, Mike Wood, Hanna Pamuła, Hervé Glotin
TL;DR
The paper tackles automatic detection of bird sounds in passive acoustic monitoring by launching a public data challenge designed to stress generalisation across species and environments. It demonstrates that modern deep-learning approaches, especially CNN/CRNN architectures, can achieve high discrimination—around $88\%$ $AUC$—without target-specific retraining, using diverse datasets (CEZ, Warblr, Freefield1010, PolandNFC) and a rigorous evaluation framework that includes calibration analysis. The work provides new annotated datasets, analyzes error modes, and assesses cross-site performance, revealing a gap between matched- and mismatched-condition generalisation while showing practical potential for remote monitoring. The PolandNFC experiments further probe transfer and self-adaptation, indicating benefits under certain conditions and guiding deployment considerations, including calibration and on-device feasibility. Overall, the study advances scalable automatic bird detection for ecosystem monitoring and informs best practices for deployment and future methodological improvements.
Abstract
Assessing the presence and abundance of birds is important for monitoring specific species as well as overall ecosystem health. Many birds are most readily detected by their sounds, and thus passive acoustic monitoring is highly appropriate. Yet acoustic monitoring is often held back by practical limitations such as the need for manual configuration, reliance on example sound libraries, low accuracy, low robustness, and limited ability to generalise to novel acoustic conditions. Here we report outcomes from a collaborative data challenge showing that with modern machine learning including deep learning, general-purpose acoustic bird detection can achieve very high retrieval rates in remote monitoring data --- with no manual recalibration, and no pre-training of the detector for the target species or the acoustic conditions in the target environment. Multiple methods were able to attain performance of around 88% AUC (area under the ROC curve), much higher performance than previous general-purpose methods. We present new acoustic monitoring datasets, summarise the machine learning techniques proposed by challenge teams, conduct detailed performance evaluation, and discuss how such approaches to detection can be integrated into remote monitoring projects.
