Unsupervised anomaly detection in large-scale estuarine acoustic telemetry data
Siphendulwe Zaza, Marcellin Atemkeng, Taryn S. Murray, John David Filmalter, Paul D. Cowley
TL;DR
This work tackles the challenge of detecting anomalous movements in large-scale acoustic telemetry data. It applies unsupervised learning, with a neural networks autoencoder (NN-AE) and a threshold-finding mechanism, to a dataset of detections from 50 dusky kob in the Breede Estuary, spanning 2016–2021. The NN-AE achieved perfect recall (no false negatives) with negligible false positives, outperforming traditional unsupervised methods that suffered high false-negative rates; resampling further improved performance. The study provides a practical, scalable framework for automated anomaly detection in ecological telemetry, with implications for data integrity, movement ecology interpretations, and conservation decision-making, while noting computational demands and opportunities for model enhancements and broader generalization.
Abstract
Acoustic telemetry data plays a vital role in understanding the behaviour and movement of aquatic animals. However, these datasets, which often consist of millions of individual data points, frequently contain anomalous movements that pose significant challenges. Traditionally, anomalous movements are identified either manually or through basic statistical methods, approaches that are time-consuming and prone to high rates of unidentified anomalies in large datasets. This study focuses on the development of automated classifiers for a large telemetry dataset comprising detections from fifty acoustically tagged dusky kob monitored in the Breede Estuary, South Africa. Using an array of 16 acoustic receivers deployed throughout the estuary between 2016 and 2021, we collected over three million individual data points. We present detailed guidelines for data pre-processing, resampling strategies, labelling process, feature engineering, data splitting methodologies, and the selection and interpretation of machine learning and deep learning models for anomaly detection. Among the evaluated models, neural networks autoencoder (NN-AE) demonstrated superior performance, aided by our proposed threshold-finding algorithm. NN-AE achieved a high recall with no false normal (i.e., no misclassifications of anomalous movements as normal patterns), a critical factor in ensuring that no true anomalies are overlooked. In contrast, other models exhibited false normal fractions exceeding 0.9, indicating they failed to detect the majority of true anomalies; a significant limitation for telemetry studies where undetected anomalies can distort interpretations of movement patterns. While the NN-AE's performance highlights its reliability and robustness in detecting anomalies, it faced challenges in accurately learning normal movement patterns when these patterns gradually deviated from anomalous ones.
