Enhancing the analysis of murine neonatal ultrasonic vocalizations: Development, evaluation, and application of different mathematical models
Rudolf Herdt, Louisa Kinzel, Johann Georg Maaß, Marvin Walther, Henning Fröhlich, Tim Schubert, Peter Maass, Christian Patrick Schaaf
TL;DR
The study tackles the challenge of reliably analyzing neonatal murine USVs by developing a two-stage pipeline that first detects calls via an entropy-based spectrogram method and then classifies them with a range of neural networks. Through 10-fold cross-validation on a sizable Nr2f1-derived dataset, EfficientNet-B5 and a compact custom CNN achieve top classification performance around $87\%$ accuracy, while a semi-automated mode leverages confidence thresholds to drastically reduce manual review with high recall. The authors provide interpretability analyses (channel visualizations and saliency maps) showing the models attend to core spectrotemporal features, and demonstrate practical utility by detecting quantitative and qualitative USV differences in autism-like mouse lines during development. This framework enables high-throughput, scalable phenotyping of neonatal USVs and offers avenues for future enhancements using attention mechanisms or contrastive learning, with potential broader applicability in rodent communication studies.
Abstract
Rodents employ a broad spectrum of ultrasonic vocalizations (USVs) for social communication. As these vocalizations offer valuable insights into affective states, social interactions, and developmental stages of animals, various deep learning approaches have aimed to automate both the quantitative (detection) and qualitative (classification) analysis of USVs. Here, we present the first systematic evaluation of different types of neural networks for USV classification. We assessed various feedforward networks, including a custom-built, fully-connected network and convolutional neural network, different residual neural networks (ResNets), an EfficientNet, and a Vision Transformer (ViT). Paired with a refined, entropy-based detection algorithm (achieving recall of 94.9% and precision of 99.3%), the best architecture (achieving 86.79% accuracy) was integrated into a fully automated pipeline capable of analyzing extensive USV datasets with high reliability. Additionally, users can specify an individual minimum accuracy threshold based on their research needs. In this semi-automated setup, the pipeline selectively classifies calls with high pseudo-probability, leaving the rest for manual inspection. Our study focuses exclusively on neonatal USVs. As part of an ongoing phenotyping study, our pipeline has proven to be a valuable tool for identifying key differences in USVs produced by mice with autism-like behaviors.
