Table of Contents
Fetching ...

Video Anomaly Detection with Contours -- A Study

Mia Siemon, Ivan Nikolov, Thomas B. Moeslund, Kamal Nasrollahi

TL;DR

This study investigates contour-based pose-based video anomaly detection by learning normal motion patterns from 2D human contours rather than skeletons, enabling broader object categories while maintaining privacy and low computation with shallow networks. It introduces two contour representations (radii-based feature descriptor and shape contexts) and evaluates both regression and classification pipelines, including VAE/LAE/TAE/R-RNN and shape-clustering/classification with novelty detection. Evaluations on six datasets show that linear auto-encoders (LAE/TAE) often outperform variational baselines, with TAE delivering strong VAD results and PAD results surpassing prior art in several settings. The findings suggest contour-based approaches are a promising, privacy-friendly direction for VAD with potential for extending to multi-class contour-based analyses of other object categories.

Abstract

In Pose-based Video Anomaly Detection prior art is rooted on the assumption that abnormal events can be mostly regarded as a result of uncommon human behavior. Opposed to utilizing skeleton representations of humans, however, we investigate the potential of learning recurrent motion patterns of normal human behavior using 2D contours. Keeping all advantages of pose-based methods, such as increased object anonymization, the shift from human skeletons to contours is hypothesized to leave the opportunity to cover more object categories open for future research. We propose formulating the problem as a regression and a classification task, and additionally explore two distinct data representation techniques for contours. To further reduce the computational complexity of Pose-based Video Anomaly Detection solutions, all methods in this study are based on shallow Neural Networks from the field of Deep Learning, and evaluated on the three most prominent benchmark datasets within Video Anomaly Detection and their human-related counterparts, totaling six datasets. Our results indicate that this novel perspective on Pose-based Video Anomaly Detection marks a promising direction for future research.

Video Anomaly Detection with Contours -- A Study

TL;DR

This study investigates contour-based pose-based video anomaly detection by learning normal motion patterns from 2D human contours rather than skeletons, enabling broader object categories while maintaining privacy and low computation with shallow networks. It introduces two contour representations (radii-based feature descriptor and shape contexts) and evaluates both regression and classification pipelines, including VAE/LAE/TAE/R-RNN and shape-clustering/classification with novelty detection. Evaluations on six datasets show that linear auto-encoders (LAE/TAE) often outperform variational baselines, with TAE delivering strong VAD results and PAD results surpassing prior art in several settings. The findings suggest contour-based approaches are a promising, privacy-friendly direction for VAD with potential for extending to multi-class contour-based analyses of other object categories.

Abstract

In Pose-based Video Anomaly Detection prior art is rooted on the assumption that abnormal events can be mostly regarded as a result of uncommon human behavior. Opposed to utilizing skeleton representations of humans, however, we investigate the potential of learning recurrent motion patterns of normal human behavior using 2D contours. Keeping all advantages of pose-based methods, such as increased object anonymization, the shift from human skeletons to contours is hypothesized to leave the opportunity to cover more object categories open for future research. We propose formulating the problem as a regression and a classification task, and additionally explore two distinct data representation techniques for contours. To further reduce the computational complexity of Pose-based Video Anomaly Detection solutions, all methods in this study are based on shallow Neural Networks from the field of Deep Learning, and evaluated on the three most prominent benchmark datasets within Video Anomaly Detection and their human-related counterparts, totaling six datasets. Our results indicate that this novel perspective on Pose-based Video Anomaly Detection marks a promising direction for future research.

Paper Structure

This paper contains 27 sections, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Overview of this study: Human contours are depicted by two distinct feature descriptors and analyzed by different regression and classification models to detect anomalous behavior in video.
  • Figure 2: Data generation pipeline for both feature descriptors
  • Figure 3: Architecture of the Variational Auto-Encoder (VAE) with four convolutional layers, serving as our baseline. The radii-based feature descriptor images are resized into quadratic form before training.
  • Figure 4: Architecture of the feedforward Linear Auto-Encoder (LAE) with two fully connected layers. The radii-based feature descriptor images are resized into quadratic form before training.
  • Figure 5: Architecture of the linear Tabular Auto-Encoder (TAE) with four fully connected layers. The radii-based feature descriptor images are transformed into tabular representation before training.
  • ...and 2 more figures