Transfer Learning with Pseudo Multi-Label Birdcall Classification for DS@GT BirdCLEF 2024
Anthony Miyaguchi, Adrian Cheung, Murilo Gustineli, Ashley Kim
TL;DR
The paper tackles multi-label birdcall classification under distributional shift by using pseudo multi-labeling with embeddings from pre-trained models (Google Bird Vocalization Classifier, BirdNET, EnCodec). It introduces a transfer-learning pipeline that leverages surrogate predictions, folder-level species cues, and multiple losses (BCE, ASL, sigmoidF1) to train linear heads on embedding spaces, with careful preprocessing and pseudo-label construction. Key findings show BirdNET embeddings paired with BCE and species-label logic often yield the strongest public leaderboard performance (~0.63), while EnCodec underperforms within the given compute constraints; analyze embedding spaces with PaCMAP to reveal structure and domain differences. The work demonstrates the practicality of leveraging unlabeled data via pseudo-labels for domain adaptation in bioacoustics, and provides a roadmap for efficient, co-occurrence-aware modeling to improve real-world ecological monitoring under tight inference budgets.
Abstract
We present working notes for the DS@GT team on transfer learning with pseudo multi-label birdcall classification for the BirdCLEF 2024 competition, focused on identifying Indian bird species in recorded soundscapes. Our approach utilizes production-grade models such as the Google Bird Vocalization Classifier, BirdNET, and EnCodec to address representation and labeling challenges in the competition. We explore the distributional shift between this year's edition of unlabeled soundscapes representative of the hidden test set and propose a pseudo multi-label classification strategy to leverage the unlabeled data. Our highest post-competition public leaderboard score is 0.63 using BirdNET embeddings with Bird Vocalization pseudo-labels. Our code is available at https://github.com/dsgt-kaggle-clef/birdclef-2024
