Feature Representations for Automatic Meerkat Vocalization Classification
Imen Ben Mahmoud, Eklavya Sarkar, Marta Manser, Mathew Magimai. -Doss
TL;DR
This work tackles automatic meerkat vocalization classification by comparing multiple feature representations, including knowledge-based hand-crafted features (Catch22, COMPARE, eGeMAPS) and neural representations from self-supervised models (WavLM, wav2vec2, HuBERT) as well as a CNN-crafted end-to-end approach. The authors evaluate these representations on two real-world datasets (Set A and Set B) using a 5-fold cross-validation SVM framework and Unweighted Average Recall (UAR) as the metric. Results show that CNN-crafted features yield the best performance, while hand-crafted features like eGeMAPS and COMPARE remain competitive; lower-layer SSL embeddings also outperform higher-layer ones, indicating effective transfer of human-speech pretraining to meerkat calls. The findings demonstrate that diverse feature representations—spanning traditional signal processing, SSL embeddings, and task-specific CNN features—can effectively support automatic meerkat call classification and encourage further interpretability of the acoustic cues involved.
Abstract
Understanding evolution of vocal communication in social animals is an important research problem. In that context, beyond humans, there is an interest in analyzing vocalizations of other social animals such as, meerkats, marmosets, apes. While existing approaches address vocalizations of certain species, a reliable method tailored for meerkat calls is lacking. To that extent, this paper investigates feature representations for automatic meerkat vocalization analysis. Both traditional signal processing-based representations and data-driven representations facilitated by advances in deep learning are explored. Call type classification studies conducted on two data sets reveal that feature extraction methods developed for human speech processing can be effectively employed for automatic meerkat call analysis.
