Table of Contents
Fetching ...

Transfer Learning with Pseudo Multi-Label Birdcall Classification for DS@GT BirdCLEF 2024

Anthony Miyaguchi, Adrian Cheung, Murilo Gustineli, Ashley Kim

TL;DR

The paper tackles multi-label birdcall classification under distributional shift by using pseudo multi-labeling with embeddings from pre-trained models (Google Bird Vocalization Classifier, BirdNET, EnCodec). It introduces a transfer-learning pipeline that leverages surrogate predictions, folder-level species cues, and multiple losses (BCE, ASL, sigmoidF1) to train linear heads on embedding spaces, with careful preprocessing and pseudo-label construction. Key findings show BirdNET embeddings paired with BCE and species-label logic often yield the strongest public leaderboard performance (~0.63), while EnCodec underperforms within the given compute constraints; analyze embedding spaces with PaCMAP to reveal structure and domain differences. The work demonstrates the practicality of leveraging unlabeled data via pseudo-labels for domain adaptation in bioacoustics, and provides a roadmap for efficient, co-occurrence-aware modeling to improve real-world ecological monitoring under tight inference budgets.

Abstract

We present working notes for the DS@GT team on transfer learning with pseudo multi-label birdcall classification for the BirdCLEF 2024 competition, focused on identifying Indian bird species in recorded soundscapes. Our approach utilizes production-grade models such as the Google Bird Vocalization Classifier, BirdNET, and EnCodec to address representation and labeling challenges in the competition. We explore the distributional shift between this year's edition of unlabeled soundscapes representative of the hidden test set and propose a pseudo multi-label classification strategy to leverage the unlabeled data. Our highest post-competition public leaderboard score is 0.63 using BirdNET embeddings with Bird Vocalization pseudo-labels. Our code is available at https://github.com/dsgt-kaggle-clef/birdclef-2024

Transfer Learning with Pseudo Multi-Label Birdcall Classification for DS@GT BirdCLEF 2024

TL;DR

The paper tackles multi-label birdcall classification under distributional shift by using pseudo multi-labeling with embeddings from pre-trained models (Google Bird Vocalization Classifier, BirdNET, EnCodec). It introduces a transfer-learning pipeline that leverages surrogate predictions, folder-level species cues, and multiple losses (BCE, ASL, sigmoidF1) to train linear heads on embedding spaces, with careful preprocessing and pseudo-label construction. Key findings show BirdNET embeddings paired with BCE and species-label logic often yield the strongest public leaderboard performance (~0.63), while EnCodec underperforms within the given compute constraints; analyze embedding spaces with PaCMAP to reveal structure and domain differences. The work demonstrates the practicality of leveraging unlabeled data via pseudo-labels for domain adaptation in bioacoustics, and provides a roadmap for efficient, co-occurrence-aware modeling to improve real-world ecological monitoring under tight inference budgets.

Abstract

We present working notes for the DS@GT team on transfer learning with pseudo multi-label birdcall classification for the BirdCLEF 2024 competition, focused on identifying Indian bird species in recorded soundscapes. Our approach utilizes production-grade models such as the Google Bird Vocalization Classifier, BirdNET, and EnCodec to address representation and labeling challenges in the competition. We explore the distributional shift between this year's edition of unlabeled soundscapes representative of the hidden test set and propose a pseudo multi-label classification strategy to leverage the unlabeled data. Our highest post-competition public leaderboard score is 0.63 using BirdNET embeddings with Bird Vocalization pseudo-labels. Our code is available at https://github.com/dsgt-kaggle-clef/birdclef-2024
Paper Structure (22 sections, 9 equations, 6 figures, 7 tables)

This paper contains 22 sections, 9 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: PaCMAP projections of the top five species averaged embeddings ranked by soundscape frequency. Embeddings can be evaluated qualitatively by clustering behavior.
  • Figure 2: Representation of the distribution of species detected in the train and unlabeled soundscapes, sorted by the frequency of species in the soundscape.
  • Figure 3: The plot shows the distribution of itemset sizes in the training and soundscape datasets. The distribution represents how likely species are to co-occur in each recording.
  • Figure 4: Diagram of the transfer learning pipeline used for experiments with BirdNET and Encodec with the Google Bird Vocalization model as a surrogate. The soft predictions from the Bird Vocalization model are used as pseudo-labels to train a multi-label classifier on BirdNET's embedding space. BirdNET is also replaced with EnCodec for comparison.
  • Figure 5: A clustering analysis of the embeddings and logits extracted from the Bird Vocalization Classifier, on the training and soundscape data. We obtain a single vector for each track by taking the max value of the logits and the mean value of the embeddings. The resulting vectors are clustered using PaCMAP and demonstrate distinctive topology resulting from distinct distributional semantics.
  • ...and 1 more figures