Transfer Learning with Semi-Supervised Dataset Annotation for Birdcall Classification
Anthony Miyaguchi, Nathan Zhong, Murilo Gustineli, Chris Hayduk
TL;DR
The paper tackles the challenge of labeling long African soundscapes for numerous bird species by leveraging transfer learning through BirdNET-derived embeddings ($320$-dimensional) and semi-supervised dataset annotation via MixIT. It outlines a full pipeline, including source separation, embedding extraction, and annotated dataset construction, and evaluates a range of modeling approaches from logistic regression to ensemble methods, highlighting the baseline's competitive performance. The findings show that a simple embedding-space classifier can be remarkably effective, while more complex feature engineering yields mixed results, underscoring the importance of data quality and representation. The work demonstrates a scalable, semi-supervised framework for birdcall classification with potential applicability to other bioacoustic tasks and species-rich domains.
Abstract
We present working notes on transfer learning with semi-supervised dataset annotation for the BirdCLEF 2023 competition, focused on identifying African bird species in recorded soundscapes. Our approach utilizes existing off-the-shelf models, BirdNET and MixIT, to address representation and labeling challenges in the competition. We explore the embedding space learned by BirdNET and propose a process to derive an annotated dataset for supervised learning. Our experiments involve various models and feature engineering approaches to maximize performance on the competition leaderboard. The results demonstrate the effectiveness of our approach in classifying bird species and highlight the potential of transfer learning and semi-supervised dataset annotation in similar tasks.
