An automatic analysis of ultrasound vocalisations for the prediction of interaction context in captive Egyptian fruit bats
Andreas Triantafyllopoulos, Alexander Gebhard, Manuel Milling, Simon Rampp, Björn Schuller
TL;DR
The paper tackles the problem of decoding interaction context from ultrasound vocalisations in captive Egyptian fruit bats, moving beyond mere presence detection to infer social states. It compares handcrafted pitch features with spectrogram-based deep neural networks (ResNet50 and EfficientNet-B0) on a large, annotated USV dataset (Prat17-AAD) and employs 3-fold subject-independent cross-validation with transfer learning. The results show that a CNN-based approach achieves a higher unweighted average recall (UAR) around 33% for 11 contexts, well above chance, indicating the feasibility of automatic context prediction from vocalisations. The work highlights both the potential for noninvasive monitoring of animal states and the need for further exploration of temporal dynamics and hidden subclass structures within contexts. Overall, the study demonstrates a promising direction for automated analysis of animal communication and health monitoring using deep learning on acoustic data.
Abstract
Prior work in computational bioacoustics has mostly focused on the detection of animal presence in a particular habitat. However, animal sounds contain much richer information than mere presence; among others, they encapsulate the interactions of those animals with other members of their species. Studying these interactions is almost impossible in a naturalistic setting, as the ground truth is often lacking. The use of animals in captivity instead offers a viable alternative pathway. However, most prior works follow a traditional, statistics-based approach to analysing interactions. In the present work, we go beyond this standard framework by attempting to predict the underlying context in interactions between captive \emph{Rousettus Aegyptiacus} using deep neural networks. We reach an unweighted average recall of over 30\% -- more than thrice the chance level -- and show error patterns that differ from our statistical analysis. This work thus represents an important step towards the automatic analysis of states in animals from sound.
