BioME: A Resource-Efficient Bioacoustic Foundational Model for IoT Applications
Heitor R. Guimarães, Abhishek Tiwari, Mahsa Abdollahi, Anderson R. Avila, Tiago H. Falk
TL;DR
BioME addresses the challenge of deploying bioacoustic encoders on resource-constrained IoT devices. It achieves this by distilling a high-capacity BEATs teacher into a compact Transformer with GQA and RoPE, augmented by modulation-spectrum features injected via FiLM and trained on multi-domain data. The approach yields state-of-the-art or competitive results on BEANS and acoustic beehive monitoring benchmarks while enabling edge deployment through significantly reduced parameters and memory requirements. The findings demonstrate that DSP-inspired inductive biases and layer-wise distillation can produce highly discriminative representations for diverse ecological tasks, enabling scalable, in-the-wild PAM.
Abstract
Passive acoustic monitoring has become a key strategy in biodiversity assessment, conservation, and behavioral ecology, especially as Internet-of-Things (IoT) devices enable continuous in situ audio collection at scale. While recent self-supervised learning (SSL)-based audio encoders, such as BEATs and AVES, have shown strong performance in bioacoustic tasks, their computational cost and limited robustness to unseen environments hinder deployment on resource-constrained platforms. In this work, we introduce BioME, a resource-efficient audio encoder designed for bioacoustic applications. BioME is trained via layer-to-layer distillation from a high-capacity teacher model, enabling strong representational transfer while reducing the parameter count by 75%. To further improve ecological generalization, the model is pretrained on multi-domain data spanning speech, environmental sounds, and animal vocalizations. A key contribution is the integration of modulation-aware acoustic features via FiLM conditioning, injecting a DSP-inspired inductive bias that enhances feature disentanglement in low-capacity regimes. Across multiple bioacoustic tasks, BioME matches or surpasses the performance of larger models, including its teacher, while being suitable for resource-constrained IoT deployments. For reproducibility, code and pretrained checkpoints are publicly available.
