Table of Contents
Fetching ...

Lightweight Hopfield Neural Networks for Bioacoustic Detection and Call Monitoring of Captive Primates

Wendy Lomas, Andrew Gascoyne, Colin Dubreuil, Stefano Vaglio, Liam Naughton

TL;DR

This study proposes a lightweight Hopfield neural network-based approach for automated bioacoustic detection of captive primate vocalisations, addressing the data backlog and resource demands of CNN-based methods. By storing FFT peak patterns and, in a second model, including a noise class, the authors demonstrate fast, transparent associative memory capable of classifying 1-second audio segments with high accuracy on consumer hardware. On a dataset of captive black-and-white ruffed lemurs, Model 2 achieves an overall accuracy of $0.94$, with improved precision for alarm calls and recall for grumbles compared to Model 1, while maintaining high non-call detection. The work argues for rapid retraining, applicability in both captive and wild settings, and a human-in-the-loop framework to accelerate data-to-insight turnaround without large labeled datasets or image-based preprocessing, contributing a scalable, explainable alternative to CNNs for PAM-based welfare monitoring and conservation.

Abstract

Passive acoustic monitoring is a sustainable method of monitoring wildlife and environments that leads to the generation of large datasets and, currently, a processing backlog. Academic research into automating this process is focused on the application of resource intensive convolutional neural networks which require large pre-labelled datasets for training and lack flexibility in application. We present a viable alternative relevant in both wild and captive settings; a transparent, lightweight and fast-to-train associative memory AI model with Hopfield neural network (HNN) architecture. Adapted from a model developed to detect bat echolocation calls, this model monitors captive endangered black-and-white ruffed lemur Varecia variegata vocalisations. Lemur social calls of interest when monitoring welfare are stored in the HNN in order to detect other call instances across the larger acoustic dataset. We make significant model improvements by storing an additional signal caused by movement and achieve an overall accuracy of 0.94. The model can perform $340$ classifications per second, processing over 5.5 hours of audio data per minute, on a standard laptop running other applications. It has broad applicability and trains in milliseconds. Our lightweight solution reduces data-to-insight turnaround times and can accelerate decision making in both captive and wild settings.

Lightweight Hopfield Neural Networks for Bioacoustic Detection and Call Monitoring of Captive Primates

TL;DR

This study proposes a lightweight Hopfield neural network-based approach for automated bioacoustic detection of captive primate vocalisations, addressing the data backlog and resource demands of CNN-based methods. By storing FFT peak patterns and, in a second model, including a noise class, the authors demonstrate fast, transparent associative memory capable of classifying 1-second audio segments with high accuracy on consumer hardware. On a dataset of captive black-and-white ruffed lemurs, Model 2 achieves an overall accuracy of , with improved precision for alarm calls and recall for grumbles compared to Model 1, while maintaining high non-call detection. The work argues for rapid retraining, applicability in both captive and wild settings, and a human-in-the-loop framework to accelerate data-to-insight turnaround without large labeled datasets or image-based preprocessing, contributing a scalable, explainable alternative to CNNs for PAM-based welfare monitoring and conservation.

Abstract

Passive acoustic monitoring is a sustainable method of monitoring wildlife and environments that leads to the generation of large datasets and, currently, a processing backlog. Academic research into automating this process is focused on the application of resource intensive convolutional neural networks which require large pre-labelled datasets for training and lack flexibility in application. We present a viable alternative relevant in both wild and captive settings; a transparent, lightweight and fast-to-train associative memory AI model with Hopfield neural network (HNN) architecture. Adapted from a model developed to detect bat echolocation calls, this model monitors captive endangered black-and-white ruffed lemur Varecia variegata vocalisations. Lemur social calls of interest when monitoring welfare are stored in the HNN in order to detect other call instances across the larger acoustic dataset. We make significant model improvements by storing an additional signal caused by movement and achieve an overall accuracy of 0.94. The model can perform classifications per second, processing over 5.5 hours of audio data per minute, on a standard laptop running other applications. It has broad applicability and trains in milliseconds. Our lightweight solution reduces data-to-insight turnaround times and can accelerate decision making in both captive and wild settings.

Paper Structure

This paper contains 10 sections, 2 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Model 1 training: sound files are collected (A); two specific calls to be monitored are identified (B); the fast Fourier transform (FFT) is applied to these signals and peak frequencies extracted from the frequency range of interest (C); neurons fire (NF) based on the FFT peaks and the network is activated ( blue) for each signal (D); these network activations are then combined via Hebbian learning (HL) (equation \ref{['eqn:Hebb']}) and the trained model is ready for activation/classification (E). Green and red lines represent positive and negative neuron associations (weights) respectively and grey lines represent disassociated neurons (zero weights).
  • Figure 2: Model 2 training: signals are collected (A); two specific calls to be monitored are identified (B) and one "noise" to be filtered; the FFT is applied to each signal and peak frequencies extracted (C); neurons fire (NF) based on the FFT peaks and the network is activated ( blue) for each signal to be stored (D); network activations are combined by Hebbian learning (HL) (equation \ref{['eqn:Hebb']}); the trained model is ready for activation/classification (E), where green and red lines represent positive and negative neuron associations (weights) respectively and grey lines represent disassociated neurons (zero weights).
  • Figure 3: Full spectrum visualisation of a grumble from $286.6$ - $290.2$ seconds as well as noises from lemur movement around the enclosure ($282$ - $286$ s). The spectrogram shows time (s), frequency (kHz) and amplitude as represented by colour intensity, with red equating to high intensity/dB, and blue low intensity.