Appearance Blur-driven AutoEncoder and Motion-guided Memory Module for Video Anomaly Detection
Jiahao Lyu, Minghua Zhao, Jing Hu, Xuewen Huang, Shuangli Du, Cheng Shi, Zhiyong Lv
TL;DR
This work tackles cross-domain video anomaly detection (VAD) by reframing anomaly handling as a deblurring task: blurred appearance frames serve as pseudo-anomalies, and a Gaussian blur-driven autoencoder learns to deblur normal content while attention suppresses blurred real anomalies. A motion-guided memory module then records and retrieves normal motion distributions to enhance normality gaps, enabling zero-shot cross-dataset validation without target-domain fine-tuning. The method combines a dual-stream architecture with a motion encoder that uses zero convolutions, MRCA-based feature refinement, and an appearance-motion fusion module, optimized by a suite of losses including a PSNR-based anomaly scoring scheme. Experiments on Ped2, Avenue, and ShanghaiTech demonstrate state-of-the-art or competitive performance, with strong cross-dataset transfer and efficient testing since motion features are used only during training.
Abstract
Video anomaly detection (VAD) often learns the distribution of normal samples and detects the anomaly through measuring significant deviations, but the undesired generalization may reconstruct a few anomalies thus suppressing the deviations. Meanwhile, most VADs cannot cope with cross-dataset validation for new target domains, and few-shot methods must laboriously rely on model-tuning from the target domain to complete domain adaptation. To address these problems, we propose a novel VAD method with a motion-guided memory module to achieve cross-dataset validation with zero-shot. First, we add Gaussian blur to the raw appearance images, thereby constructing the global pseudo-anomaly, which serves as the input to the network. Then, we propose multi-scale residual channel attention to deblur the pseudo-anomaly in normal samples. Next, memory items are obtained by recording the motion features in the training phase, which are used to retrieve the motion features from the raw information in the testing phase. Lastly, our method can ignore the blurred real anomaly through attention and rely on motion memory items to increase the normality gap between normal and abnormal motion. Extensive experiments on three benchmark datasets demonstrate the effectiveness of the proposed method. Compared with cross-domain methods, our method achieves competitive performance without adaptation during testing.
