Graph-Jigsaw Conditioned Diffusion Model for Skeleton-based Video Anomaly Detection

Ali Karami; Thi Kieu Khanh Ho; Narges Armanfard

Graph-Jigsaw Conditioned Diffusion Model for Skeleton-based Video Anomaly Detection

Ali Karami, Thi Kieu Khanh Ho, Narges Armanfard

TL;DR

The paper tackles skeleton-based video anomaly detection (SVAD) by addressing three core challenges: capturing spatio-temporal dependencies among joints, recognizing region-specific discrepancies in motion, and accounting for the infinite variation of human actions. It introduces GiCiSAD, a lightweight framework consisting of a Graph Attention-Based Forecasting module, a Graph-level Jigsaw Puzzle Maker for self-supervised region-level discrimination, and a Graph-based Conditional Diffusion Model to generate diverse future motions conditioned on past frames. The method achieves state-of-the-art AUROC on four benchmark SVAD datasets while using up to 40% fewer parameters than prior unsupervised approaches, highlighting both effectiveness and efficiency. By combining dynamic graph learning, challenging graph-level self-supervision, and diffusion-based diverse generation, GiCiSAD robustly detects anomalies across varied motions and regions, with practical potential for real-time surveillance applications.

Abstract

Skeleton-based video anomaly detection (SVAD) is a crucial task in computer vision. Accurately identifying abnormal patterns or events enables operators to promptly detect suspicious activities, thereby enhancing safety. Achieving this demands a comprehensive understanding of human motions, both at body and region levels, while also accounting for the wide variations of performing a single action. However, existing studies fail to simultaneously address these crucial properties. This paper introduces a novel, practical and lightweight framework, namely Graph-Jigsaw Conditioned Diffusion Model for Skeleton-based Video Anomaly Detection (GiCiSAD) to overcome the challenges associated with SVAD. GiCiSAD consists of three novel modules: the Graph Attention-based Forecasting module to capture the spatio-temporal dependencies inherent in the data, the Graph-level Jigsaw Puzzle Maker module to distinguish subtle region-level discrepancies between normal and abnormal motions, and the Graph-based Conditional Diffusion model to generate a wide spectrum of human motions. Extensive experiments on four widely used skeleton-based video datasets show that GiCiSAD outperforms existing methods with significantly fewer training parameters, establishing it as the new state-of-the-art.

Graph-Jigsaw Conditioned Diffusion Model for Skeleton-based Video Anomaly Detection

TL;DR

Abstract

Paper Structure (22 sections, 16 equations, 6 figures, 7 tables, 2 algorithms)

This paper contains 22 sections, 16 equations, 6 figures, 7 tables, 2 algorithms.

Introduction
Proposed Method
Graph Attention-Based Forecasting
Graph-level Jigsaw Puzzle Maker
Graph-based Conditional Diffusion Model
Experiments
Experimental Settings
Comparison with State-of-The-Art
Parameter Efficiency
Ablation Study
Conclusion
Related Work
Skeleton-based Video Anomaly Detection
Self-supervised Learning
Graph-based Approaches
...and 7 more sections

Figures (6)

Figure 1: The overall framework of GiCiSAD.
Figure 2: The overview of the graph-level Jigsaw puzzle-solving approach. Nodes with the same color formulate a subgraph. Note that although each node is required to have $\delta$ connections, for better visualization, this property is not strictly maintained in the figure.
Figure 3: Histograms of the anomaly scores for 50 future frames generated by Diffusion on the HR-STC dataset, for both cases of conditioning on normal and abnormal past frames.
Figure 4: Comparison of Encoder-based and Autoencoder-based conditioning mechanisms.
Figure 5: Visualization of the Intra-Community shuffling approach. Nodes with the same color formulate a subgraph. Note that although each node is required to have $\delta$ connections, for improved visualization, this property is not strictly maintained in the figure.
...and 1 more figures

Graph-Jigsaw Conditioned Diffusion Model for Skeleton-based Video Anomaly Detection

TL;DR

Abstract

Graph-Jigsaw Conditioned Diffusion Model for Skeleton-based Video Anomaly Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (6)