Motif Mining and Unsupervised Representation Learning for BirdCLEF 2022

Anthony Miyaguchi; Jiangyue Yu; Bryan Cheungvivatpant; Dakota Dudley; Aniketh Swain

Motif Mining and Unsupervised Representation Learning for BirdCLEF 2022

Anthony Miyaguchi, Jiangyue Yu, Bryan Cheungvivatpant, Dakota Dudley, Aniketh Swain

TL;DR

The paper tackles multi-label birdcall classification in BirdCLEF 2022 using unsupervised learning due to sparse per-window labels. It combines SiMPle motif mining on spectrograms to extract salient 5-second patterns and to generate soft labels, with a modified Tile2Vec embedding trained via a triplet-loss objective to place similar birdcall motifs near each other in a low-dimensional space. Downstream classifiers are trained on the embedding, but the reported public leaderboard score peaks around 0.48, indicating substantial room for improvement and highlighting practical engineering challenges. The study demonstrates the potential of unsupervised representation learning for bioacoustic tasks and outlines concrete directions—such as refining spectrogram parameters, triplet formation, and efficient implementations—to advance BirdCLEF-style challenges.

Abstract

We build a classification model for the BirdCLEF 2022 challenge using unsupervised methods. We implement an unsupervised representation of the training dataset using a triplet loss on spectrogram representation of audio motifs. Our best model performs with a score of 0.48 on the public leaderboard.

Motif Mining and Unsupervised Representation Learning for BirdCLEF 2022

TL;DR

Abstract

Paper Structure (9 sections, 1 equation, 3 figures, 3 tables, 1 algorithm)

This paper contains 9 sections, 1 equation, 3 figures, 3 tables, 1 algorithm.

Introduction
Motif Mining
Birdcall Embedding
Experiments
Motif Mining Details
Embedding Clustering Quality
Classifier Performance
Discussion
Conclusions

Figures (3)

Figure 1: Spectrograms show frequency components of audio transformed via STFT. We apply SiMPLe to obtain a matrix profile that summarizes the distance to the nearest neighbor for all time-slices in the spectrogram. Spectrogram parameterization affects the quality of the matrix profile summary.
Figure 2: Flow diagram of the constituent pieces of the birdcall embedding.
Figure 3: A scatter plot of a random set of motifs (n=300) drawn from three species of birds. The motifs are truncated, padded, and transformed into the embedding space. We plot the top two principle components found by PCA.

Motif Mining and Unsupervised Representation Learning for BirdCLEF 2022

TL;DR

Abstract

Motif Mining and Unsupervised Representation Learning for BirdCLEF 2022

Authors

TL;DR

Abstract

Table of Contents

Figures (3)