Table of Contents
Fetching ...

Perch 2.0 transfers 'whale' to underwater tasks

Andrea Burns, Lauren Harrell, Bart van Merriënboer, Vincent Dumoulin, Jenny Hamer, Tom Denton

TL;DR

This work assesses whether a large terrestrial bioacoustic foundation model, Perch 2.0, can transfer to marine mammal and underwater sound tasks via few-shot linear probing. Using embeddings from Perch 2.0, evaluated on NOAA PIPAN, ReefSet, and DCLDE 2026, the study compares against several marine-oriented baselines and demonstrates strong transfer performance, often outperforming alternatives except in a single Known Bio Species case. The results, complemented by tSNE visualizations, suggest that broad, richly labeled acoustic representations learned from birds can generalize to marine contexts, aided by model scale and shared sound-production characteristics. The findings advocate for agile, embedding-based transfer learning to accelerate marine bioacoustic classifier development with limited labeled data.

Abstract

Perch 2.0 is a supervised bioacoustics foundation model pretrained on 14,597 species, including birds, mammals, amphibians, and insects, and has state-of-the-art performance on multiple benchmarks. Given that Perch 2.0 includes almost no marine mammal audio or classes in the training data, we evaluate Perch 2.0 performance on marine mammal and underwater audio tasks through few-shot transfer learning. We perform linear probing with the embeddings generated from this foundation model and compare performance to other pretrained bioacoustics models. In particular, we compare Perch 2.0 with previous multispecies whale, Perch 1.0, SurfPerch, AVES-bio, BirdAVES, and Birdnet V2.3 models, which have open-source tools for transfer-learning and agile modeling. We show that the embeddings from the Perch 2.0 model have consistently high performance for few-shot transfer learning, generally outperforming alternative embedding models on the majority of tasks, and thus is recommended when developing new linear classifiers for marine mammal classification with few labeled examples.

Perch 2.0 transfers 'whale' to underwater tasks

TL;DR

This work assesses whether a large terrestrial bioacoustic foundation model, Perch 2.0, can transfer to marine mammal and underwater sound tasks via few-shot linear probing. Using embeddings from Perch 2.0, evaluated on NOAA PIPAN, ReefSet, and DCLDE 2026, the study compares against several marine-oriented baselines and demonstrates strong transfer performance, often outperforming alternatives except in a single Known Bio Species case. The results, complemented by tSNE visualizations, suggest that broad, richly labeled acoustic representations learned from birds can generalize to marine contexts, aided by model scale and shared sound-production characteristics. The findings advocate for agile, embedding-based transfer learning to accelerate marine bioacoustic classifier development with limited labeled data.

Abstract

Perch 2.0 is a supervised bioacoustics foundation model pretrained on 14,597 species, including birds, mammals, amphibians, and insects, and has state-of-the-art performance on multiple benchmarks. Given that Perch 2.0 includes almost no marine mammal audio or classes in the training data, we evaluate Perch 2.0 performance on marine mammal and underwater audio tasks through few-shot transfer learning. We perform linear probing with the embeddings generated from this foundation model and compare performance to other pretrained bioacoustics models. In particular, we compare Perch 2.0 with previous multispecies whale, Perch 1.0, SurfPerch, AVES-bio, BirdAVES, and Birdnet V2.3 models, which have open-source tools for transfer-learning and agile modeling. We show that the embeddings from the Perch 2.0 model have consistently high performance for few-shot transfer learning, generally outperforming alternative embedding models on the majority of tasks, and thus is recommended when developing new linear classifiers for marine mammal classification with few labeled examples.

Paper Structure

This paper contains 8 sections, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Performance of trained models on marine datasets, varying the number of training examples per-class. *Class "Bm" dropped for $k=16$; **Classes 'Bm' and 'Be' dropped for $k=32$ in NOAA PIPAN data.
  • Figure 2: tSNE plot of Perch 2.0 Embeddings for DCLDE 2026 Ecotype data
  • Figure 3: tSNE plots on the DCLDE 2026 Ecotype dataset which contains five ecotype variants of the killer whale (orca) species. Plots were generated with Sklearn PCA and tSNE libraries, with embeddings first projected to 32 dimension vectors prior to tSNE being applied.