Generalized Video Anomaly Event Detection: Systematic Taxonomy and Comparison of Deep Models

Yang Liu; Dingkang Yang; Yan Wang; Jing Liu; Jun Liu; Azzedine Boukerche; Peng Sun; Liang Song

Generalized Video Anomaly Event Detection: Systematic Taxonomy and Comparison of Deep Models

Yang Liu, Dingkang Yang, Yan Wang, Jing Liu, Jun Liu, Azzedine Boukerche, Peng Sun, Liang Song

TL;DR

This survey addresses the need for a generalized framework for Video Anomaly Event Detection (GVAED) that encompasses unsupervised, weakly-supervised, supervised, and fully-unsupervised paradigms. It introduces a hierarchical taxonomy organized by supervision, input data, and network structure, and collects datasets, codebases, and literature to enable comprehensive comparisons of deep models. The authors analyze cross-scenario and multimodal challenges, highlight performance trends across UVAD, WAED, SVAD, and FVAD, and discuss practical considerations such as online deployment and data realism. By synthesizing development trends and offering a public resource, the work aims to guide researchers and practitioners toward scalable, real-world GVAED solutions that can operate across scenes, modalities, and platforms.

Abstract

Video Anomaly Detection (VAD) serves as a pivotal technology in the intelligent surveillance systems, enabling the temporal or spatial identification of anomalous events within videos. While existing reviews predominantly concentrate on conventional unsupervised methods, they often overlook the emergence of weakly-supervised and fully-unsupervised approaches. To address this gap, this survey extends the conventional scope of VAD beyond unsupervised methods, encompassing a broader spectrum termed Generalized Video Anomaly Event Detection (GVAED). By skillfully incorporating recent advancements rooted in diverse assumptions and learning frameworks, this survey introduces an intuitive taxonomy that seamlessly navigates through unsupervised, weakly-supervised, supervised and fully-unsupervised VAD methodologies, elucidating the distinctions and interconnections within these research trajectories. In addition, this survey facilitates prospective researchers by assembling a compilation of research resources, including public datasets, available codebases, programming tools, and pertinent literature. Furthermore, this survey quantitatively assesses model performance, delves into research challenges and directions, and outlines potential avenues for future exploration.

Generalized Video Anomaly Event Detection: Systematic Taxonomy and Comparison of Deep Models

TL;DR

Abstract

Paper Structure (42 sections, 4 equations, 8 figures, 8 tables)

This paper contains 42 sections, 4 equations, 8 figures, 8 tables.

Introduction
Literature Statistics
Related Reviews
Contribution Summary
Foundations of GVAED
Definition of the Anomaly
Problem Formation
Benchmark Datasets
Subway Entrance & Exit
UMN
UCSD Pedestrian
CUHK Avenue
ShanghaiTech
UCF-Crime
XD-Violence
...and 27 more sections

Figures (8)

Figure 1: Taxonomy of Generalized Video Anomaly Event Detection (GVAED). We provide a hierarchical taxonomy that organizes existing deep GVAED models by supervised signals, model inputs, and network structure into a systematic framework, including Unsupervised Video Anomaly Detection (UVAD), Weakly-supervised Abnormal Event Detection (WAED), Fully-unsupervised VAD (FVAD) and Supervised VAD (SVAD). Besides, we collate benchmark datasets, evaluation metrics, available codes, and literature to a public GitHub repository$^1$. Finally, we analyze the research challenges and possible trends.
Figure 2: Publication and citation statistics on the topic of (a) Video Anomaly Detection and (b) Abnormal Event Detection.
Figure 3: Illustration of training data. (a) UVAD trains the model using only normal data, with the hidden implication that all video-level and frame-level labels are 0. (b) WAED models use positive and negative samples and require frame-level labels, where $Y=0$ indicates normal video and $Y=1$ indicates an anomaly. (c) SVAD is trained using a fine-grained frame-level labeling supervised model, where the semantics of the frame-level labels expose the video-level labels. (d) FVAD attempts to learn the anomaly detector from under-processed data with training data containing both normal and anomalous samples and without any level of labeling.
Figure 4: Illustration of the two-stage UVAD framework. Anomaly detection is performed in the test phase as a downstream task of proxy task-based normality learning. The example video frames are from the CUHK Avenue T3 dataset.
Figure 5: Structure of the MIL ranking model MIR. The anomalous video $\mathcal{V}_a$ and the normal video $\mathcal{V}_n$ are first sliced into several equal-size instances. The positive bag $\mathcal{B}_a$ contains at least one positive instance, while the negative bag $\mathcal{B}_n$ contains only normal instances. In the test phase, the well-trained MIL regression model output the anomaly scores of instances in the test video $\mathcal{V}_t$ directly.
...and 3 more figures

Generalized Video Anomaly Event Detection: Systematic Taxonomy and Comparison of Deep Models

TL;DR

Abstract

Generalized Video Anomaly Event Detection: Systematic Taxonomy and Comparison of Deep Models

Authors

TL;DR

Abstract

Table of Contents

Figures (8)