Generative AI-based data augmentation for improved bioacoustic classification in noisy environments
Anthony Gibbons, Emma King, Ian Donohue, Andrew Parnell
TL;DR
This work tackles data scarcity in bioacoustic species classification by introducing generative AI-based spectrogram augmentation using ACGAN and diffusion models. It demonstrates that Stable Diffusion–in a latent-diffusion framework–produces high-quality, diverse spectrograms that, when added to real data, improve classifier performance in noisy wind-farm environments. Although synthetic augmentation yields notable gains, especially in validation, it does not surpass a strong BirdNET baseline on human-labelled test data, highlighting ongoing challenges with pseudo-label bias and data representativeness. The study provides practical insights and resources for incorporating synthetic data into ecoacoustic pipelines and establishes a foundation for broader applications across taxa and habitats.
Abstract
Obtaining data to train robust artificial intelligence (AI)-based models for species classification can be challenging, particularly for rare species. Data augmentation can boost classification accuracy by increasing the diversity of training data and is cheaper to obtain than expert-labelled data. However, many classic image-based augmentation techniques are not suitable for audio spectrograms. We investigate two generative AI models as data augmentation tools to synthesise spectrograms and supplement audio data: Auxiliary Classifier Generative Adversarial Networks (ACGAN) and Denoising Diffusion Probabilistic Models (DDPMs). The latter performed particularly well in terms of both realism of generated spectrograms and accuracy in a resulting classification task. Alongside these new approaches, we present a new audio data set of 640 hours of bird calls from wind farm sites in Ireland, approximately 800 samples of which have been labelled by experts. Wind farm data are particularly challenging for classification models given the background wind and turbine noise. Training an ensemble of classification models on real and synthetic data combined compared well with highly confident BirdNET predictions. Each classifier we used was improved by including synthetic data, and classification metrics generally improved in line with the amount of synthetic data added. Our approach can be used to augment acoustic signals for more species and other land-use types, and has the potential to bring about advances in our capacity to develop reliable AI-based detection of rare species. Our code is available at https://github.com/gibbona1/SpectrogramGenAI.
