An Attribute-based Method for Video Anomaly Detection
Tal Reiss, Yedid Hoshen
TL;DR
This work tackles video anomaly detection under a one-class setting by introducing an explicit attribute-based representation. Each frame is described by object-level velocity and pose, augmented with CLIP-based deep features, and anomaly scores are derived via density estimation with per-feature calibration. The approach achieves state-of-the-art results on Ped2, Avenue, and ShanghaiTech, with velocity alone providing strong performance and the combination with pose and CLIP delivering further gains. The method is lightweight, scalable, and relies on off-the-shelf components, offering a practical VAD solution with clear priors and robust performance across datasets.
Abstract
Video anomaly detection (VAD) identifies suspicious events in videos, which is critical for crime prevention and homeland security. In this paper, we propose a simple but highly effective VAD method that relies on attribute-based representations. The base version of our method represents every object by its velocity and pose, and computes anomaly scores by density estimation. Surprisingly, this simple representation is sufficient to achieve state-of-the-art performance in ShanghaiTech, the most commonly used VAD dataset. Combining our attribute-based representations with an off-the-shelf, pretrained deep representation yields state-of-the-art performance with a $99.1\%, 93.7\%$, and $85.9\%$ AUROC on Ped2, Avenue, and ShanghaiTech, respectively.
