Dynamic Distinction Learning: Adaptive Pseudo Anomalies for Video Anomaly Detection

Demetris Lappas; Vasileios Argyriou; Dimitrios Makris

Dynamic Distinction Learning: Adaptive Pseudo Anomalies for Video Anomaly Detection

Demetris Lappas, Vasileios Argyriou, Dimitrios Makris

TL;DR

DDL addresses video anomaly detection by combining dynamically weighted pseudo anomalies with a Distinction Loss to learn without fixed thresholds. It introduces a Pseudo Anomaly Creator, a Conv3DSkipUNet reconstruction model, and a loss $L = L_{recon} + \lambda L_{dist}$, where $L_{dist}$ encourages reconstructing pseudo anomalies toward normal frames. Evaluations on Ped2, Avenue, and ShanghaiTech demonstrate strong performance, with Ped2 98.46% AUC and Avenue 90.35% AUC, and ShanghaiTech benefiting from per-scene adaptation. Ablation confirms the benefits of dynamic weighting and Distinction Loss across UNet and C3DSU architectures, indicating wide applicability and scalability for scene-specific surveillance contexts.

Abstract

We introduce Dynamic Distinction Learning (DDL) for Video Anomaly Detection, a novel video anomaly detection methodology that combines pseudo-anomalies, dynamic anomaly weighting, and a distinction loss function to improve detection accuracy. By training on pseudo-anomalies, our approach adapts to the variability of normal and anomalous behaviors without fixed anomaly thresholds. Our model showcases superior performance on the Ped2, Avenue and ShanghaiTech datasets, where individual models are tailored for each scene. These achievements highlight DDL's effectiveness in advancing anomaly detection, offering a scalable and adaptable solution for video surveillance challenges.

Dynamic Distinction Learning: Adaptive Pseudo Anomalies for Video Anomaly Detection

TL;DR

, where

encourages reconstructing pseudo anomalies toward normal frames. Evaluations on Ped2, Avenue, and ShanghaiTech demonstrate strong performance, with Ped2 98.46% AUC and Avenue 90.35% AUC, and ShanghaiTech benefiting from per-scene adaptation. Ablation confirms the benefits of dynamic weighting and Distinction Loss across UNet and C3DSU architectures, indicating wide applicability and scalability for scene-specific surveillance contexts.

Abstract

Paper Structure (18 sections, 7 equations, 8 figures, 3 tables)

This paper contains 18 sections, 7 equations, 8 figures, 3 tables.

Introduction
Related Work
Methodology
Pseudo Anomaly Creator
Reconstruction Model Definition
Loss Function
Reconstruction Loss
Distinction Loss
Inference
Datasets
Results
Ablation Studies
Conclusion
Dynamics of Anomaly Weight $\sigma(\ell)$ in Model Training (Methodology Supplementary)
Visualizing the Effects of Training
...and 3 more sections

Figures (8)

Figure 1: The Dynamic Distinction Learning (DDL) Architecture: This diagram illustrates the DDL model's workflow, including object detection and tracking, random object masking, pseudo anomaly creation, our C3DSU model and the distinction loss calculation. The architecture depicts how the pseudo anomalies are created, then passed through the model along with their normal counter parts. The diagram also provides a visual depiction of the distinction loss calculation, showing how the model learns to minimize the numerator and maximize the denominator.
Figure 2: Pseudo-Anomaly Creation Process: This figure demonstrates the step-by-step procedure for generating pseudo-anomalies within video frames. It begins by receiving the normal input frames, the masked frames, and a dynamically learned anomaly weight followed by the application of a noise tensor modulated by the anomaly weight.
Figure 3: Panel (a) depicts a scenario where $\sigma(\ell)$ approaches zero, leading to minimal deviation from the original frame and challenging the model's ability to distinguish between normal and anomalous regions due to the lack of significant noise. Panel (b) illustrates the opposite extreme, where $\sigma(\ell)$ is near one, resulting in an overly distorted anomalous region dominated by noise, which challenges the model's reconstruction capabilities and undermines the distinction loss's effectiveness.
Figure 4: Visual Comparison of Model Training Effects: This figure provides a comprehensive visualization of the model's performance across different frames and stages of reconstruction. It features the original middle frame $X^t$, the reconstructed frame from normal input $f(Xt$, the pseudo-anomalous middle frame $X_A^t$, the reconstructed frame from the pseudo-anomalous input $f(X_A)$, and the reconstruction errors $\lVert X^t-f(X) \rVert$, $\lVert X_A^t-f(X_A) \rVert$, and $\lVert X^t-f(X_A) \rVert$.
Figure 5: Evolution of the anomaly weight $\sigma(\ell)$ during the training of the C3DSU model on the Ped2 Dataset for 10 epochs.
...and 3 more figures

Dynamic Distinction Learning: Adaptive Pseudo Anomalies for Video Anomaly Detection

TL;DR

Abstract

Dynamic Distinction Learning: Adaptive Pseudo Anomalies for Video Anomaly Detection

Authors

TL;DR

Abstract

Table of Contents

Figures (8)