Evaluating the Effectiveness of Video Anomaly Detection in the Wild: Online Learning and Inference for Real-world Deployment

Shanle Yao; Ghazal Alinezhad Noghre; Armin Danesh Pazho; Hamed Tabkhi

Evaluating the Effectiveness of Video Anomaly Detection in the Wild: Online Learning and Inference for Real-world Deployment

Shanle Yao, Ghazal Alinezhad Noghre, Armin Danesh Pazho, Hamed Tabkhi

TL;DR

The paper addresses the gap between offline VAD research and real-world deployment by proposing an online unsupervised framework for pose-based VAD that learns from streaming data. It evaluats three state-of-the-art pose-based methods (GEPC, STG-NF, TSGAD) across ShanghaiTech and CHAD domains, demonstrating that online adaptation can preserve a large portion of offline performance in target domains. The findings underscore the potential of online learning to enable privacy-preserving, real-world VAD with cross-domain robustness, while highlighting ongoing challenges in domain shifts and data-volume variability. The work lays groundwork for integrated streaming training and timing analysis to advance practical, in-the-wild VAD systems.

Abstract

Video Anomaly Detection (VAD) identifies unusual activities in video streams, a key technology with broad applications ranging from surveillance to healthcare. Tackling VAD in real-life settings poses significant challenges due to the dynamic nature of human actions, environmental variations, and domain shifts. Many research initiatives neglect these complexities, often concentrating on traditional testing methods that fail to account for performance on unseen datasets, creating a gap between theoretical models and their real-world utility. Online learning is a potential strategy to mitigate this issue by allowing models to adapt to new information continuously. This paper assesses how well current VAD algorithms can adjust to real-life conditions through an online learning framework, particularly those based on pose analysis, for their efficiency and privacy advantages. Our proposed framework enables continuous model updates with streaming data from novel environments, thus mirroring actual world challenges and evaluating the models' ability to adapt in real-time while maintaining accuracy. We investigate three state-of-the-art models in this setting, focusing on their adaptability across different domains. Our findings indicate that, even under the most challenging conditions, our online learning approach allows a model to preserve 89.39% of its original effectiveness compared to its offline-trained counterpart in a specific target domain.

Evaluating the Effectiveness of Video Anomaly Detection in the Wild: Online Learning and Inference for Real-world Deployment

TL;DR

Abstract

Paper Structure (11 sections, 5 figures, 3 tables)

This paper contains 11 sections, 5 figures, 3 tables.

Introduction
Related Works
Methodology
Inference Methodology
Input Stream
Detection
Collection Methodology
Training Methodology
Experiments and Evaluation
Research Questions and Future Directions
Conclusion

Figures (5)

Figure 1: A conceptual overview of an end-to-end system with online unsupervised anomaly detection training. Frame sequences (FS) collected from surveillance cameras pass through a pre-processing phase to extract necessary annotations (A), including bounding boxes (BB), tracking information (ID), and pose information. This information consequently goes through anomaly detection, which is used for real-time inference and collection. The collection algorithm collects enough frame annotations (F[n]) for training. After training, Updated Weights (UW) are replaced for the next inference step.
Figure 2: A conceptual overview of an end-to-end system with online unsupervised anomaly detection training.
Figure 3: Model AUC-ROC percentage Trend Comparison by Training Number: Long dashes indicate Offline Training, solid lines indicate Online Training, and dots indicate the Baseline (No training).
Figure 4: Model AUC-PR percentage Trend Comparison by Training Number: Long dashes indicate Offline Training, solid lines indicate Online Training, and dots indicate the Baseline (No training).
Figure 5: Model EER percentage Trend Comparison by Training Number: Long dashes indicate Offline Training, solid lines indicate Online Training, and dots indicate the Baseline (No training).

Evaluating the Effectiveness of Video Anomaly Detection in the Wild: Online Learning and Inference for Real-world Deployment

TL;DR

Abstract

Evaluating the Effectiveness of Video Anomaly Detection in the Wild: Online Learning and Inference for Real-world Deployment

Authors

TL;DR

Abstract

Table of Contents

Figures (5)