Table of Contents
Fetching ...

An Exploratory Study on Human-Centric Video Anomaly Detection through Variational Autoencoders and Trajectory Prediction

Ghazal Alinezhad Noghre, Armin Danesh Pazho, Hamed Tabkhi

TL;DR

TSGAD is introduced, a novel human-centric Two-Stream Graph-Improved Anomaly Detection leveraging Variational Autoencoders (VAEs) and trajectory prediction that aims to explore the possibility of utilizing VAEs as a new approach for pose-based human-centric VAD alongside the benefits of trajectory prediction.

Abstract

Video Anomaly Detection (VAD) represents a challenging and prominent research task within computer vision. In recent years, Pose-based Video Anomaly Detection (PAD) has drawn considerable attention from the research community due to several inherent advantages over pixel-based approaches despite the occasional suboptimal performance. Specifically, PAD is characterized by reduced computational complexity, intrinsic privacy preservation, and the mitigation of concerns related to discrimination and bias against specific demographic groups. This paper introduces TSGAD, a novel human-centric Two-Stream Graph-Improved Anomaly Detection leveraging Variational Autoencoders (VAEs) and trajectory prediction. TSGAD aims to explore the possibility of utilizing VAEs as a new approach for pose-based human-centric VAD alongside the benefits of trajectory prediction. We demonstrate TSGAD's effectiveness through comprehensive experimentation on benchmark datasets. TSGAD demonstrates comparable results with state-of-the-art methods showcasing the potential of adopting variational autoencoders. This suggests a promising direction for future research endeavors. The code base for this work is available at https://github.com/TeCSAR-UNCC/TSGAD.

An Exploratory Study on Human-Centric Video Anomaly Detection through Variational Autoencoders and Trajectory Prediction

TL;DR

TSGAD is introduced, a novel human-centric Two-Stream Graph-Improved Anomaly Detection leveraging Variational Autoencoders (VAEs) and trajectory prediction that aims to explore the possibility of utilizing VAEs as a new approach for pose-based human-centric VAD alongside the benefits of trajectory prediction.

Abstract

Video Anomaly Detection (VAD) represents a challenging and prominent research task within computer vision. In recent years, Pose-based Video Anomaly Detection (PAD) has drawn considerable attention from the research community due to several inherent advantages over pixel-based approaches despite the occasional suboptimal performance. Specifically, PAD is characterized by reduced computational complexity, intrinsic privacy preservation, and the mitigation of concerns related to discrimination and bias against specific demographic groups. This paper introduces TSGAD, a novel human-centric Two-Stream Graph-Improved Anomaly Detection leveraging Variational Autoencoders (VAEs) and trajectory prediction. TSGAD aims to explore the possibility of utilizing VAEs as a new approach for pose-based human-centric VAD alongside the benefits of trajectory prediction. We demonstrate TSGAD's effectiveness through comprehensive experimentation on benchmark datasets. TSGAD demonstrates comparable results with state-of-the-art methods showcasing the potential of adopting variational autoencoders. This suggests a promising direction for future research endeavors. The code base for this work is available at https://github.com/TeCSAR-UNCC/TSGAD.
Paper Structure (29 sections, 11 equations, 3 figures, 3 tables)

This paper contains 29 sections, 11 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: TSGAD architecture. The upper branch utilizes Graph Attentive Variational Autoencoder (GA-VAE) for learning the characteristics of normal human behavior distribution in an unsupervised manner. The lower branch leverages a SotA trajectory prediction method, namely Pishgu alinezhad2023pishgu, for learning how to predict normal trajectories. $P_t^i$ denotes the $i^{th}$ person at time $t$, and $D$, $\mu$, and $\sigma$ refer to the latent representation's dimensions, mean, and variance. $z$ follows a normal distribution with $z \sim (0, I)$, where $I$ is the identity matrix.
  • Figure 2: Nine layers of spatio-temporal graph convolution blocks are stacked forming the GA-VAE encoder. Each block consists of a spatial attention graph convolution followed by temporal convolution, batch normalization, a residual connection, and a final activation function.
  • Figure 3: The inference phase. The deviation from API in the latent space is used for calculating the pose score ($S_{Pose}$). The difference between the predicted trajectory and the actual trajectory measured by MSE is used to form a trajectory score ($S_{Traj}$). The weighted sum of these normalized scores forms the final anomaly score. $\mu_n$, $\sigma_n$, and API refer to the mean, and variance of the latent representation and Aggregated Parameter Index defined in \ref{['eq:api']} respectively.