Table of Contents
Fetching ...

Uncovering Anomalous Events for Marine Environmental Monitoring via Visual Anomaly Detection

Laura Weihl, Stefan H. Bengtson, Nejc Novak, Malte Pedersen

TL;DR

This work tackles scalable marine event discovery from extensive underwater video by applying visual anomaly detection (VAD) and introducing AURA, a multi-annotator underwater VAD benchmark. It evaluates four VAD models on two marine scenes, highlighting the value of soft and consensus annotations to account for subjectivity in event boundaries. The study shows that peak-based frame selection more robustly identifies anomalous segments than thresholding, and that model performance strongly depends on training data and annotator variability. Collectively, the results support using multi-annotator labels and frame-level segmentation to enable scalable, camera-agnostic marine biodiversity monitoring.

Abstract

Underwater video monitoring is a promising strategy for assessing marine biodiversity, but the vast volume of uneventful footage makes manual inspection highly impractical. In this work, we explore the use of visual anomaly detection (VAD) based on deep neural networks to automatically identify interesting or anomalous events. We introduce AURA, the first multi-annotator benchmark dataset for underwater VAD, and evaluate four VAD models across two marine scenes. We demonstrate the importance of robust frame selection strategies to extract meaningful video segments. Our comparison against multiple annotators reveals that VAD performance of current models varies dramatically and is highly sensitive to both the amount of training data and the variability in visual content that defines "normal" scenes. Our results highlight the value of soft and consensus labels and offer a practical approach for supporting scientific exploration and scalable biodiversity monitoring.

Uncovering Anomalous Events for Marine Environmental Monitoring via Visual Anomaly Detection

TL;DR

This work tackles scalable marine event discovery from extensive underwater video by applying visual anomaly detection (VAD) and introducing AURA, a multi-annotator underwater VAD benchmark. It evaluates four VAD models on two marine scenes, highlighting the value of soft and consensus annotations to account for subjectivity in event boundaries. The study shows that peak-based frame selection more robustly identifies anomalous segments than thresholding, and that model performance strongly depends on training data and annotator variability. Collectively, the results support using multi-annotator labels and frame-level segmentation to enable scalable, camera-agnostic marine biodiversity monitoring.

Abstract

Underwater video monitoring is a promising strategy for assessing marine biodiversity, but the vast volume of uneventful footage makes manual inspection highly impractical. In this work, we explore the use of visual anomaly detection (VAD) based on deep neural networks to automatically identify interesting or anomalous events. We introduce AURA, the first multi-annotator benchmark dataset for underwater VAD, and evaluate four VAD models across two marine scenes. We demonstrate the importance of robust frame selection strategies to extract meaningful video segments. Our comparison against multiple annotators reveals that VAD performance of current models varies dramatically and is highly sensitive to both the amount of training data and the variability in visual content that defines "normal" scenes. Our results highlight the value of soft and consensus labels and offer a practical approach for supporting scientific exploration and scalable biodiversity monitoring.

Paper Structure

This paper contains 26 sections, 5 equations, 10 figures, 3 tables.

Figures (10)

  • Figure 1: A VAD model trained on normal frames from an underwater camera to detect interesting events. As the fish enters the scene, the anomaly score from the model increases until the fish disappear again. The interesting event can then be detected as the sequence with a consistently high anomaly score. The multi-annotator ground truth encapsulates that some parts of the video may be less likely to be considered interesting.
  • Figure 2: Sample images of anomalies in scene A (top) and B (bottom) in AURA: Anomalous Underwater Reef Activity.
  • Figure 3: The AnemoCam features an adjustable LED light and a wide-angle high-resolution camera. To avoid buildup of biofouling, a mechanical wiper periodically sweeps the camera lens.
  • Figure 4: A screenshot of our custom tool AnomaTag for our anomalous event annotation.
  • Figure 5: Cohen's Kappa scores between annotators.
  • ...and 5 more figures

Theorems & Definitions (1)

  • Definition 1