Table of Contents
Fetching ...

An Attribute-based Method for Video Anomaly Detection

Tal Reiss, Yedid Hoshen

TL;DR

This work tackles video anomaly detection under a one-class setting by introducing an explicit attribute-based representation. Each frame is described by object-level velocity and pose, augmented with CLIP-based deep features, and anomaly scores are derived via density estimation with per-feature calibration. The approach achieves state-of-the-art results on Ped2, Avenue, and ShanghaiTech, with velocity alone providing strong performance and the combination with pose and CLIP delivering further gains. The method is lightweight, scalable, and relies on off-the-shelf components, offering a practical VAD solution with clear priors and robust performance across datasets.

Abstract

Video anomaly detection (VAD) identifies suspicious events in videos, which is critical for crime prevention and homeland security. In this paper, we propose a simple but highly effective VAD method that relies on attribute-based representations. The base version of our method represents every object by its velocity and pose, and computes anomaly scores by density estimation. Surprisingly, this simple representation is sufficient to achieve state-of-the-art performance in ShanghaiTech, the most commonly used VAD dataset. Combining our attribute-based representations with an off-the-shelf, pretrained deep representation yields state-of-the-art performance with a $99.1\%, 93.7\%$, and $85.9\%$ AUROC on Ped2, Avenue, and ShanghaiTech, respectively.

An Attribute-based Method for Video Anomaly Detection

TL;DR

This work tackles video anomaly detection under a one-class setting by introducing an explicit attribute-based representation. Each frame is described by object-level velocity and pose, augmented with CLIP-based deep features, and anomaly scores are derived via density estimation with per-feature calibration. The approach achieves state-of-the-art results on Ped2, Avenue, and ShanghaiTech, with velocity alone providing strong performance and the combination with pose and CLIP delivering further gains. The method is lightweight, scalable, and relies on off-the-shelf components, offering a practical VAD solution with clear priors and robust performance across datasets.

Abstract

Video anomaly detection (VAD) identifies suspicious events in videos, which is critical for crime prevention and homeland security. In this paper, we propose a simple but highly effective VAD method that relies on attribute-based representations. The base version of our method represents every object by its velocity and pose, and computes anomaly scores by density estimation. Surprisingly, this simple representation is sufficient to achieve state-of-the-art performance in ShanghaiTech, the most commonly used VAD dataset. Combining our attribute-based representations with an off-the-shelf, pretrained deep representation yields state-of-the-art performance with a , and AUROC on Ped2, Avenue, and ShanghaiTech, respectively.
Paper Structure (21 sections, 2 equations, 10 figures, 9 tables)

This paper contains 21 sections, 2 equations, 10 figures, 9 tables.

Figures (10)

  • Figure 1: The Avenue and ShanghaiTech datasets. We present the most normal and anomalous frames for each feature. For anomalous frames, we visualize the bounding box of the object with the highest anomaly score. Best viewed in color.
  • Figure 2: An overview of our method. We first extract optical flow maps and bounding boxes for all of the objects in the frame. We then crop each object from the original image and its corresponding flow map. Our representation consists of velocity, pose, and deep (CLIP) features.
  • Figure 3: An illustration of our velocity feature vector.Left: We quantize the orientations into $B=8$ equi-spaced bins, and assign each optical flow vector in the object's bounding box is to a single bin. Right: The value of each bin is the average magnitude of the optical flow vectors assigned to this bin. Best viewed in color.
  • Figure 4: Frame-level scores and anomaly localizations for Avenue's test video $04$. Best viewed in color.
  • Figure 5: Frame-level scores and anomaly localizations for ShanghaiTech's test video $03\_0059$. Best viewed in color.
  • ...and 5 more figures