Table of Contents
Fetching ...

Detecting Contextual Anomalies by Discovering Consistent Spatial Regions

Zhengye Yang, Richard J. Radke

TL;DR

This work tackles spatial-context dependent video anomaly detection by learning region-consistent normal behavior through automatic discovery of high-resolution spatial regions. It introduces a three-stage approach: object-centric feature extraction, region discovery via heatmap clustering with a full-covariance Gaussian Mixture Model, and region-specific normality modeling for anomaly scoring. The method achieves state-of-the-art performance on Street Scene with far fewer, semantically meaningful regions and provides interpretable normalcy maps, while also performing competitively on Ped2 and ShanghaiTech. The approach offers practical benefits in efficiency and explainability, and suggests future extensions to soft region assignment and pose-informed features for further gains.

Abstract

We describe a method for modeling spatial context to enable video anomaly detection. The main idea is to discover regions that share similar object-level activities by clustering joint object attributes using Gaussian mixture models. We demonstrate that this straightforward approach, using orders of magnitude fewer parameters than competing models, achieves state-of-the-art performance in the challenging spatial-context-dependent Street Scene dataset. As a side benefit, the high-resolution discovered regions learned by the model also provide explainable normalcy maps for human operators without the need for any pre-trained segmentation model.

Detecting Contextual Anomalies by Discovering Consistent Spatial Regions

TL;DR

This work tackles spatial-context dependent video anomaly detection by learning region-consistent normal behavior through automatic discovery of high-resolution spatial regions. It introduces a three-stage approach: object-centric feature extraction, region discovery via heatmap clustering with a full-covariance Gaussian Mixture Model, and region-specific normality modeling for anomaly scoring. The method achieves state-of-the-art performance on Street Scene with far fewer, semantically meaningful regions and provides interpretable normalcy maps, while also performing competitively on Ped2 and ShanghaiTech. The approach offers practical benefits in efficiency and explainability, and suggests future extensions to soft region assignment and pose-informed features for further gains.

Abstract

We describe a method for modeling spatial context to enable video anomaly detection. The main idea is to discover regions that share similar object-level activities by clustering joint object attributes using Gaussian mixture models. We demonstrate that this straightforward approach, using orders of magnitude fewer parameters than competing models, achieves state-of-the-art performance in the challenging spatial-context-dependent Street Scene dataset. As a side benefit, the high-resolution discovered regions learned by the model also provide explainable normalcy maps for human operators without the need for any pre-trained segmentation model.
Paper Structure (15 sections, 3 equations, 8 figures, 7 tables)

This paper contains 15 sections, 3 equations, 8 figures, 7 tables.

Figures (8)

  • Figure 1: A toy example of building spatial context. Using the discovered semantically meaningful regions, spatial-context-dependent anomalies can be easily detected.
  • Figure 2: The proposed method discovers regions that have similar object and motion (top left image), illustrated here in the Street Scene dataset (lower left image) ramachandra_street_2020. Each region is characterized by a learned mixture of prototypical events in appearance/motion feature space. On the right side we show motion descriptors in magnitude/angle space for learned modes of the mixture model in several regions, along with corresponding objects.
  • Figure 3: A CNN-based autoencoder fails to learn spatial context due to its shift-invariant nature. The autoencoder can perfectly reconstruct both normal and abnormal samples without understanding what is allowable in each region.
  • Figure 4: Overview of the proposed system inference pipeline. Extracted object level features are assigned to different GMM models based on region discovery results.
  • Figure 5: The region discovery process. We form a heatmap over all training objects that belong to a certain location. After collecting the histogram for each region, we cluster and form semantic regions.
  • ...and 3 more figures