Table of Contents
Fetching ...

TeG: Temporal-Granularity Method for Anomaly Detection with Attention in Smart City Surveillance

Erkut Akdag, Egor Bondarev, Peter H. N. De With

TL;DR

A temporal-granularity method for an anomaly detection model (TeG) in real-world surveillance, combining spatio-temporal features at different time-scales at different time-scales is presented.

Abstract

Anomaly detection in video surveillance has recently gained interest from the research community. Temporal duration of anomalies vary within video streams, leading to complications in learning the temporal dynamics of specific events. This paper presents a temporal-granularity method for an anomaly detection model (TeG) in real-world surveillance, combining spatio-temporal features at different time-scales. The TeG model employs multi-head cross-attention blocks and multi-head self-attention blocks for this purpose. Additionally, we extend the UCF-Crime dataset with new anomaly types relevant to Smart City research project. The TeG model is deployed and validated in a city surveillance system, achieving successful real-time results in industrial settings.

TeG: Temporal-Granularity Method for Anomaly Detection with Attention in Smart City Surveillance

TL;DR

A temporal-granularity method for an anomaly detection model (TeG) in real-world surveillance, combining spatio-temporal features at different time-scales at different time-scales is presented.

Abstract

Anomaly detection in video surveillance has recently gained interest from the research community. Temporal duration of anomalies vary within video streams, leading to complications in learning the temporal dynamics of specific events. This paper presents a temporal-granularity method for an anomaly detection model (TeG) in real-world surveillance, combining spatio-temporal features at different time-scales. The TeG model employs multi-head cross-attention blocks and multi-head self-attention blocks for this purpose. Additionally, we extend the UCF-Crime dataset with new anomaly types relevant to Smart City research project. The TeG model is deployed and validated in a city surveillance system, achieving successful real-time results in industrial settings.

Paper Structure

This paper contains 9 sections, 4 equations, 2 figures, 3 tables.

Figures (2)

  • Figure 1: Workflow of the TeG model. The input video is split into 32 video segments. The Video Swin Transformer (VST) extracts features $F_{S}$, $F_{M}$, and $F_{L}$ within short, medium, and long temporal granularity. Next, the TeG captures the correlations among three features and the dependencies among different video segments to fuse the features into the output feature matrix $X$. After anomaly scores of 32 video segments are obtained by a classifier, detailed information about the detected anomaly is provided to control-room operators.
  • Figure 2: Anomaly visualizations and the detection scores/frame of the TeG model. The three sub-figures at the left display anomalies: dangerous throwing, fighting, and littering. The top figures illustrates the example frames sampled from the related anomalies, orange-colored boxes delineate where the anomaly occurs. The red-colored regions in the bottom graphs indicate the frame-level ground-truth labels of anomalous events. Right snapshot of the user interface indicating the cameras and informative text related to the detected anomaly.