Motif Mining and Unsupervised Representation Learning for BirdCLEF 2022
Anthony Miyaguchi, Jiangyue Yu, Bryan Cheungvivatpant, Dakota Dudley, Aniketh Swain
TL;DR
The paper tackles multi-label birdcall classification in BirdCLEF 2022 using unsupervised learning due to sparse per-window labels. It combines SiMPle motif mining on spectrograms to extract salient 5-second patterns and to generate soft labels, with a modified Tile2Vec embedding trained via a triplet-loss objective to place similar birdcall motifs near each other in a low-dimensional space. Downstream classifiers are trained on the embedding, but the reported public leaderboard score peaks around 0.48, indicating substantial room for improvement and highlighting practical engineering challenges. The study demonstrates the potential of unsupervised representation learning for bioacoustic tasks and outlines concrete directions—such as refining spectrogram parameters, triplet formation, and efficient implementations—to advance BirdCLEF-style challenges.
Abstract
We build a classification model for the BirdCLEF 2022 challenge using unsupervised methods. We implement an unsupervised representation of the training dataset using a triplet loss on spectrogram representation of audio motifs. Our best model performs with a score of 0.48 on the public leaderboard.
