Detecting Contextual Anomalies by Discovering Consistent Spatial Regions
Zhengye Yang, Richard J. Radke
TL;DR
This work tackles spatial-context dependent video anomaly detection by learning region-consistent normal behavior through automatic discovery of high-resolution spatial regions. It introduces a three-stage approach: object-centric feature extraction, region discovery via heatmap clustering with a full-covariance Gaussian Mixture Model, and region-specific normality modeling for anomaly scoring. The method achieves state-of-the-art performance on Street Scene with far fewer, semantically meaningful regions and provides interpretable normalcy maps, while also performing competitively on Ped2 and ShanghaiTech. The approach offers practical benefits in efficiency and explainability, and suggests future extensions to soft region assignment and pose-informed features for further gains.
Abstract
We describe a method for modeling spatial context to enable video anomaly detection. The main idea is to discover regions that share similar object-level activities by clustering joint object attributes using Gaussian mixture models. We demonstrate that this straightforward approach, using orders of magnitude fewer parameters than competing models, achieves state-of-the-art performance in the challenging spatial-context-dependent Street Scene dataset. As a side benefit, the high-resolution discovered regions learned by the model also provide explainable normalcy maps for human operators without the need for any pre-trained segmentation model.
