Video Anomaly Detection via Spatio-Temporal Pseudo-Anomaly Generation : A Unified Approach
Ayush K. Rai, Tarun Krishna, Feiyan Hu, Alexandru Drimbarean, Kevin McGuinness, Alan F. Smeaton, Noel E. O'Connor
TL;DR
This work tackles video anomaly detection under open-set conditions by generating generic spatio-temporal pseudo-anomalies without dataset-specific priors. It leverages a pre-trained Latent Diffusion Model to create spatial PAs via inpainting and applies mixup to optical-flow patches for temporal PAs, combined with a ViFi-CLIP-based semantic discriminator to capture semantic inconsistency. A unified OCC framework jointly estimates reconstruction quality, temporal irregularity, and semantic inconsistency through two 3D-CNN autoencoders and a semantic discriminator, with an aggregated anomaly score across three indicators. Experiments on Ped2, Avenue, ShanghaiTech, and UBnormal show competitive performance to state-of-the-art methods and evidence of transferability of PAs across datasets, highlighting robustness and generalization of the approach.
Abstract
Video Anomaly Detection (VAD) is an open-set recognition task, which is usually formulated as a one-class classification (OCC) problem, where training data is comprised of videos with normal instances while test data contains both normal and anomalous instances. Recent works have investigated the creation of pseudo-anomalies (PAs) using only the normal data and making strong assumptions about real-world anomalies with regards to abnormality of objects and speed of motion to inject prior information about anomalies in an autoencoder (AE) based reconstruction model during training. This work proposes a novel method for generating generic spatio-temporal PAs by inpainting a masked out region of an image using a pre-trained Latent Diffusion Model and further perturbing the optical flow using mixup to emulate spatio-temporal distortions in the data. In addition, we present a simple unified framework to detect real-world anomalies under the OCC setting by learning three types of anomaly indicators, namely reconstruction quality, temporal irregularity and semantic inconsistency. Extensive experiments on four VAD benchmark datasets namely Ped2, Avenue, ShanghaiTech and UBnormal demonstrate that our method performs on par with other existing state-of-the-art PAs generation and reconstruction based methods under the OCC setting. Our analysis also examines the transferability and generalisation of PAs across these datasets, offering valuable insights by identifying real-world anomalies through PAs.
