Enhance Multi-Scale Spatial-Temporal Coherence for Configurable Video Anomaly Detection
Kai Cheng, Xinzhe Li, Lijuan Che
TL;DR
The work tackles the challenge of varying detection demands in unsupervised video anomaly detection by introducing Degree of Tolerance (DoT) and a configurable two-tier CVAD architecture. It couples a stack-and-block design with Multi-Scale Memory with Selective Mechanism (MS$^2$M) to model spatial-temporal coherence across multiple scales, enabling rapid adaptation to new DoTs by freezing existing blocks and adding new ones. The MS$^2$M module employs depth-wise convolutions for receptive-field growth and attention-based memory reading/writing to memorize and refine normal patterns, optimized by a joint reconstruction and DoT-aware loss. Experiments on three standard benchmarks and a DoT-extended Ped2 dataset demonstrate state-of-the-art performance and substantial training-time savings, highlighting CVAD's practical impact for configurable, resource-efficient VAD in dynamic environments.
Abstract
The development of unsupervised Video Anomaly Detection (VAD) relies on technologies in the field of signal processing. Since the anomaly is quite ambiguous and unbounded, different detection demands may often be raised even in one scenario. Thus, we propose to design the configurable VAD with flexible solutions targeting to solve the issue that previous methods have to train their models from scratch and waste resources when detection demands even change slightly. Moreover, we also design a dataset with good compatibility to evaluate the VAD performance when changes happen in detection demands. Besides, videos contain important information regarding continuous changes in the object's appearance and motion. Thus, we also propose a module to establish the multi-scale spatial-temporal coherence, which improves the accuracy and has the ability to dynamically adjust and accurately capture spatial-temporal normal patterns. Experiments show that our method not only models coherence effectively but also has better configurable ability.
