Table of Contents
Fetching ...

Unsupervised anomaly detection in large-scale estuarine acoustic telemetry data

Siphendulwe Zaza, Marcellin Atemkeng, Taryn S. Murray, John David Filmalter, Paul D. Cowley

TL;DR

This work tackles the challenge of detecting anomalous movements in large-scale acoustic telemetry data. It applies unsupervised learning, with a neural networks autoencoder (NN-AE) and a threshold-finding mechanism, to a dataset of detections from 50 dusky kob in the Breede Estuary, spanning 2016–2021. The NN-AE achieved perfect recall (no false negatives) with negligible false positives, outperforming traditional unsupervised methods that suffered high false-negative rates; resampling further improved performance. The study provides a practical, scalable framework for automated anomaly detection in ecological telemetry, with implications for data integrity, movement ecology interpretations, and conservation decision-making, while noting computational demands and opportunities for model enhancements and broader generalization.

Abstract

Acoustic telemetry data plays a vital role in understanding the behaviour and movement of aquatic animals. However, these datasets, which often consist of millions of individual data points, frequently contain anomalous movements that pose significant challenges. Traditionally, anomalous movements are identified either manually or through basic statistical methods, approaches that are time-consuming and prone to high rates of unidentified anomalies in large datasets. This study focuses on the development of automated classifiers for a large telemetry dataset comprising detections from fifty acoustically tagged dusky kob monitored in the Breede Estuary, South Africa. Using an array of 16 acoustic receivers deployed throughout the estuary between 2016 and 2021, we collected over three million individual data points. We present detailed guidelines for data pre-processing, resampling strategies, labelling process, feature engineering, data splitting methodologies, and the selection and interpretation of machine learning and deep learning models for anomaly detection. Among the evaluated models, neural networks autoencoder (NN-AE) demonstrated superior performance, aided by our proposed threshold-finding algorithm. NN-AE achieved a high recall with no false normal (i.e., no misclassifications of anomalous movements as normal patterns), a critical factor in ensuring that no true anomalies are overlooked. In contrast, other models exhibited false normal fractions exceeding 0.9, indicating they failed to detect the majority of true anomalies; a significant limitation for telemetry studies where undetected anomalies can distort interpretations of movement patterns. While the NN-AE's performance highlights its reliability and robustness in detecting anomalies, it faced challenges in accurately learning normal movement patterns when these patterns gradually deviated from anomalous ones.

Unsupervised anomaly detection in large-scale estuarine acoustic telemetry data

TL;DR

This work tackles the challenge of detecting anomalous movements in large-scale acoustic telemetry data. It applies unsupervised learning, with a neural networks autoencoder (NN-AE) and a threshold-finding mechanism, to a dataset of detections from 50 dusky kob in the Breede Estuary, spanning 2016–2021. The NN-AE achieved perfect recall (no false negatives) with negligible false positives, outperforming traditional unsupervised methods that suffered high false-negative rates; resampling further improved performance. The study provides a practical, scalable framework for automated anomaly detection in ecological telemetry, with implications for data integrity, movement ecology interpretations, and conservation decision-making, while noting computational demands and opportunities for model enhancements and broader generalization.

Abstract

Acoustic telemetry data plays a vital role in understanding the behaviour and movement of aquatic animals. However, these datasets, which often consist of millions of individual data points, frequently contain anomalous movements that pose significant challenges. Traditionally, anomalous movements are identified either manually or through basic statistical methods, approaches that are time-consuming and prone to high rates of unidentified anomalies in large datasets. This study focuses on the development of automated classifiers for a large telemetry dataset comprising detections from fifty acoustically tagged dusky kob monitored in the Breede Estuary, South Africa. Using an array of 16 acoustic receivers deployed throughout the estuary between 2016 and 2021, we collected over three million individual data points. We present detailed guidelines for data pre-processing, resampling strategies, labelling process, feature engineering, data splitting methodologies, and the selection and interpretation of machine learning and deep learning models for anomaly detection. Among the evaluated models, neural networks autoencoder (NN-AE) demonstrated superior performance, aided by our proposed threshold-finding algorithm. NN-AE achieved a high recall with no false normal (i.e., no misclassifications of anomalous movements as normal patterns), a critical factor in ensuring that no true anomalies are overlooked. In contrast, other models exhibited false normal fractions exceeding 0.9, indicating they failed to detect the majority of true anomalies; a significant limitation for telemetry studies where undetected anomalies can distort interpretations of movement patterns. While the NN-AE's performance highlights its reliability and robustness in detecting anomalies, it faced challenges in accurately learning normal movement patterns when these patterns gradually deviated from anomalous ones.

Paper Structure

This paper contains 17 sections, 4 equations, 18 figures, 12 tables.

Figures (18)

  • Figure 1: Map of the Breede Estuary showing the locations of acoustic receivers (black pins) deployed to monitor dusky kob movements in the estuary between 2016 and 2021.
  • Figure 4: Flowchart for anomaly detection in the acoustic telemetry test dataset comprising 50 dusky kob tagged and monitored in the Breede Estuary between 2016 and 2021.
  • Figure 5: Examples of fish detections across three different days 2017-01-15 (a), (2016-12-22 (b) and 2016-12-10(c)) show irregular sampling patterns in acoustic telemetry data. The horizontal axis represents the time of day, while the vertical markers differentiate between normal (red) and anomalous (blue) detections. The varying frequency and irregular intervals of detections show the challenges of missing data and the importance of resampling strategies to preserve signal integrity and improve unsupervised classifier performance.
  • Figure 6: Comparison of resampled normal fish detections across three different days: 2017-01-15 (a), 2016-12-22 (b) and 2016-12-10 (c). Each day is resampled with the smallest sampling rate among the three days, approximating regular intervals. The resampling strategy is designed to adjust for irregular sampling patterns and ultimately improve the performance of unsupervised classifiers.
  • Figure 7: The diagram shows an anomaly detection workflow with data pre-processing, followed by data splitting into training and testing sets, and concludes with model training and testing for both the NN-AE and traditional models used in this work.
  • ...and 13 more figures