UCF-Crime-DVS: A Novel Event-Based Dataset for Video Anomaly Detection with Spiking Neural Networks
Yuanbin Qian, Shuhan Ye, Chong Wang, Xiaojie Cai, Jiangbo Qian, Jiafei Wu
TL;DR
This paper tackles video anomaly detection by exploiting dynamic information from event-based sensors. It introduces UCF-Crime-DVS, the first large-scale event-based VAD dataset aligned with UCF-Crime, and proposes a fully spiking neural network framework (MSF) with a Temporal Interaction Module to effectively fuse multi-scale temporal features. The approach combines local and global spiking features via pyramidal dilated convolutions and a SpikingGCN, with TIM enabling temporal information integration, and uses DMIL and center losses to optimize weak supervision. Empirical results show that MSF achieves a competitive AUC and low false alarm rate on UCF-Crime-DVS, establishing a new baseline for event-based weakly supervised VAD and highlighting the practical value of event-based data for surveillance tasks.
Abstract
Video anomaly detection plays a significant role in intelligent surveillance systems. To enhance model's anomaly recognition ability, previous works have typically involved RGB, optical flow, and text features. Recently, dynamic vision sensors (DVS) have emerged as a promising technology, which capture visual information as discrete events with a very high dynamic range and temporal resolution. It reduces data redundancy and enhances the capture capacity of moving objects compared to conventional camera. To introduce this rich dynamic information into the surveillance field, we created the first DVS video anomaly detection benchmark, namely UCF-Crime-DVS. To fully utilize this new data modality, a multi-scale spiking fusion network (MSF) is designed based on spiking neural networks (SNNs). This work explores the potential application of dynamic information from event data in video anomaly detection. Our experiments demonstrate the effectiveness of our framework on UCF-Crime-DVS and its superior performance compared to other models, establishing a new baseline for SNN-based weakly supervised video anomaly detection.
