On Feature Learning for Titi Monkey Activity Detection

Aditya Ravuri; Jen Muir; Neil D. Lawrence

On Feature Learning for Titi Monkey Activity Detection

Aditya Ravuri, Jen Muir, Neil D. Lawrence

TL;DR

This work addresses the challenge of detecting Coppery titi monkey vocal activity from limited labeled data by combining MFCC features with a bidirectional LSTM classifier to model $P(c_t=1|\mathbf{a})$ for five-second audio segments. A beam-search decoder is used to identify call segments, achieving approximately 95% instance accuracy and 82% conditional accuracy, with outputs biased toward smooth probability trajectories. A second-stage classifier operating on averaged latent representations achieves about 75% accuracy on unseen data for call-type labeling, with wav2vec features offering marginal gains for classification. The framework demonstrates robust real-world bioacoustic monitoring potential and can be extended to detect non-linear phenomena and other species, contributing to ecological research and wildlife conservation.

Abstract

This paper, a technical summary of our preceding publication, introduces a robust machine learning framework for the detection of vocal activities of Coppery titi monkeys. Utilizing a combination of MFCC features and a bidirectional LSTM-based classifier, we effectively address the challenges posed by the small amount of expert-annotated vocal data available. Our approach significantly reduces false positives and improves the accuracy of call detection in bioacoustic research. Initial results demonstrate an accuracy of 95\% on instance predictions, highlighting the effectiveness of our model in identifying and classifying complex vocal patterns in environmental audio recordings. Moreover, we show how call classification can be done downstream, paving the way for real-world monitoring.

On Feature Learning for Titi Monkey Activity Detection

TL;DR

This work addresses the challenge of detecting Coppery titi monkey vocal activity from limited labeled data by combining MFCC features with a bidirectional LSTM classifier to model

for five-second audio segments. A beam-search decoder is used to identify call segments, achieving approximately 95% instance accuracy and 82% conditional accuracy, with outputs biased toward smooth probability trajectories. A second-stage classifier operating on averaged latent representations achieves about 75% accuracy on unseen data for call-type labeling, with wav2vec features offering marginal gains for classification. The framework demonstrates robust real-world bioacoustic monitoring potential and can be extended to detect non-linear phenomena and other species, contributing to ecological research and wildlife conservation.

Abstract

Paper Structure (4 sections, 4 figures)

This paper contains 4 sections, 4 figures.

Introduction
Activity Detection Methodology
Call Classification
Conclusion

Figures (4)

Figure 1: Illustration of model architecture.
Figure 2: Illustration of model predictions, showing smoothness of output probabilities.
Figure 3: Classification metrics over 1000 epochs of model training, as the number of files used for training are varied.
Figure 4: tSNE of average wav2vec-based features coloured by call type. MFCC-based plots look similar.

On Feature Learning for Titi Monkey Activity Detection

TL;DR

Abstract

On Feature Learning for Titi Monkey Activity Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (4)