Appearance Blur-driven AutoEncoder and Motion-guided Memory Module for Video Anomaly Detection

Jiahao Lyu; Minghua Zhao; Jing Hu; Xuewen Huang; Shuangli Du; Cheng Shi; Zhiyong Lv

Appearance Blur-driven AutoEncoder and Motion-guided Memory Module for Video Anomaly Detection

Jiahao Lyu, Minghua Zhao, Jing Hu, Xuewen Huang, Shuangli Du, Cheng Shi, Zhiyong Lv

TL;DR

This work tackles cross-domain video anomaly detection (VAD) by reframing anomaly handling as a deblurring task: blurred appearance frames serve as pseudo-anomalies, and a Gaussian blur-driven autoencoder learns to deblur normal content while attention suppresses blurred real anomalies. A motion-guided memory module then records and retrieves normal motion distributions to enhance normality gaps, enabling zero-shot cross-dataset validation without target-domain fine-tuning. The method combines a dual-stream architecture with a motion encoder that uses zero convolutions, MRCA-based feature refinement, and an appearance-motion fusion module, optimized by a suite of losses including a PSNR-based anomaly scoring scheme. Experiments on Ped2, Avenue, and ShanghaiTech demonstrate state-of-the-art or competitive performance, with strong cross-dataset transfer and efficient testing since motion features are used only during training.

Abstract

Video anomaly detection (VAD) often learns the distribution of normal samples and detects the anomaly through measuring significant deviations, but the undesired generalization may reconstruct a few anomalies thus suppressing the deviations. Meanwhile, most VADs cannot cope with cross-dataset validation for new target domains, and few-shot methods must laboriously rely on model-tuning from the target domain to complete domain adaptation. To address these problems, we propose a novel VAD method with a motion-guided memory module to achieve cross-dataset validation with zero-shot. First, we add Gaussian blur to the raw appearance images, thereby constructing the global pseudo-anomaly, which serves as the input to the network. Then, we propose multi-scale residual channel attention to deblur the pseudo-anomaly in normal samples. Next, memory items are obtained by recording the motion features in the training phase, which are used to retrieve the motion features from the raw information in the testing phase. Lastly, our method can ignore the blurred real anomaly through attention and rely on motion memory items to increase the normality gap between normal and abnormal motion. Extensive experiments on three benchmark datasets demonstrate the effectiveness of the proposed method. Compared with cross-domain methods, our method achieves competitive performance without adaptation during testing.

Appearance Blur-driven AutoEncoder and Motion-guided Memory Module for Video Anomaly Detection

TL;DR

Abstract

Paper Structure (21 sections, 18 equations, 11 figures, 4 tables)

This paper contains 21 sections, 18 equations, 11 figures, 4 tables.

Introduction
Related work
Unsupervised VAD
Cross-domain Detection
Memory Module
Pseudo-anomaly Data
METHODOLOGY
Dual-stream Network Architecture
Gaussian blur-driven AE
Multi-scale residual channel attention
Zero convolution removes motion noise
Appearance motion fusion module
Motion-guided Memory Module
Loss Function and Anomaly Scoring
EXPERIMENTS
...and 6 more sections

Figures (11)

Figure 1: Typical solutions of memory modules in video anomaly detection. Left: Most memory modules reconstruct entire appearance features, but are limited by the memory item size. Middle: A few memory modules reconstruct motion features, but are limited to RoI bounding boxes. Right: The new memory module proposed is not limited by the above shortcomings. By retrieving background-independent motion features, it is simple to implement VAD and cross-domain detection.
Figure 2: Overview framework of the proposed method. It employs a dual-stream AE with input Gaussian blur appearance images $B_{1:t}$ and motion images $O_{1:t}$ to output a predicted image $\hat{I}_{t+1}$, and consists of skip connections with MRCA, a motion-guided memory module, an appearance motion fusion module. During testing, just input blurred appearance images. The horizontal dimension indicates the number of output channels. H and W denote the height and width of features, respectively.
Figure 3: Multi-scale residual channel attention.
Figure 4: Motion-guided memory module. c: Cosine similarities, s: Softmax function. See text for details.
Figure 5: Different input images of three datasets. From top to bottom are UCSD Ped2, CUHK Avenue, and ShanghaiTech.
...and 6 more figures

Appearance Blur-driven AutoEncoder and Motion-guided Memory Module for Video Anomaly Detection

TL;DR

Abstract

Appearance Blur-driven AutoEncoder and Motion-guided Memory Module for Video Anomaly Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (11)