Lightning Fast Video Anomaly Detection via Adversarial Knowledge Distillation

Florinel-Alin Croitoru; Nicolae-Catalin Ristea; Dana Dascalescu; Radu Tudor Ionescu; Fahad Shahbaz Khan; Mubarak Shah

Lightning Fast Video Anomaly Detection via Adversarial Knowledge Distillation

Florinel-Alin Croitoru, Nicolae-Catalin Ristea, Dana Dascalescu, Radu Tudor Ionescu, Fahad Shahbaz Khan, Mubarak Shah

TL;DR

This work tackles real-time video anomaly detection by training a lightweight frame-level convolutional-transformer student to imitate two strong object-centric teachers through standard distillation ($\mathcal{L}_{\mathrm{KD}}$) and novel adversarial distillation ($\mathcal{L}_{\mathrm{AKD}}$). The approach uses a two-phase training regime, replacing heavy detectors with a downsampling, multi-head transformer that outputs multi-scale anomaly maps, and combining teacher guidance from two sources to improve generalization. Empirical results on Avenue, ShanghaiTech, and UCSD Ped2 show the method achieves an exceptional speed-accuracy balance, up to $1480$ FPS and competitive micro/macro AUC scores, while reducing GFLOPs and memory usage. The work demonstrates the practicality of deploying surveillance-scale anomaly detection with minimal hardware demands, and discusses avenues for strengthening accuracy via additional teachers and tasks.

Abstract

We propose a very fast frame-level model for anomaly detection in video, which learns to detect anomalies by distilling knowledge from multiple highly accurate object-level teacher models. To improve the fidelity of our student, we distill the low-resolution anomaly maps of the teachers by jointly applying standard and adversarial distillation, introducing an adversarial discriminator for each teacher to distinguish between target and generated anomaly maps. We conduct experiments on three benchmarks (Avenue, ShanghaiTech, UCSD Ped2), showing that our method is over 7 times faster than the fastest competing method, and between 28 and 62 times faster than object-centric models, while obtaining comparable results to recent methods. Our evaluation also indicates that our model achieves the best trade-off between speed and accuracy, due to its previously unheard-of speed of 1480 FPS. In addition, we carry out a comprehensive ablation study to justify our architectural design choices. Our code is freely available at: https://github.com/ristea/fast-aed.

Lightning Fast Video Anomaly Detection via Adversarial Knowledge Distillation

TL;DR

) and novel adversarial distillation (

). The approach uses a two-phase training regime, replacing heavy detectors with a downsampling, multi-head transformer that outputs multi-scale anomaly maps, and combining teacher guidance from two sources to improve generalization. Empirical results on Avenue, ShanghaiTech, and UCSD Ped2 show the method achieves an exceptional speed-accuracy balance, up to

FPS and competitive micro/macro AUC scores, while reducing GFLOPs and memory usage. The work demonstrates the practicality of deploying surveillance-scale anomaly detection with minimal hardware demands, and discusses avenues for strengthening accuracy via additional teachers and tasks.

Abstract

Paper Structure (14 sections, 8 equations, 12 figures, 11 tables)

This paper contains 14 sections, 8 equations, 12 figures, 11 tables.

Introduction
Related work
Method
Experiments
Data sets
Experimental setup
Results
Quantitative analysis
Qualitative analysis
Speed versus accuracy trade-off
Ablation studies
Reproducibility
Limitations
Conclusion

Figures (12)

Figure 1: Our pipeline comprises an efficient student model that learns to distill knowledge from two object-centric teachers via a combination of direct and adversarial losses. The pipeline can be trivially extended to any number of teachers. The knowledge distillation loss $\mathcal{L}_{\hbox{\scriptsize{KD}}}$ is defined in Eq. \ref{['pixel_distillation']}, while the adversarial knowledge distillation loss $\mathcal{L}_{\hbox{\scriptsize{AKD}}}$ is defined in Eq. \ref{['adv_distillation']}. For simplicity, we do not represent the multi-frame input and multi-resolution output of our student. Best viewed in color.
Figure 2: The trade-off between performance (micro AUC) and speed (FPS) for our student versus multiple state-of-the-art methods Georgescu-CVPR-2021Georgescu-TPAMI-2021Gong-ICCV-2019Ionescu-ICCV-2017Liu-CVPR-2018Liu-ICCV-2021Park-CVPR-2020Park-WACV-2022Ristea-CVPR-2022Wang-ECCV-2022 (with available code), on the Avenue data set. The running times of all methods are measured on a machine with an Nvidia GeForce GTX 3090 GPU with 24 GB of VRAM. Best viewed in color.
Figure 3: Architecture of our student based on a convolutional transformer and multiple downsampling blocks. For simplicity, we represent the adversarial discriminator and anomaly maps for only one teacher. Best viewed in color.
Figure 4: Comparing the frame-level anomaly scores of teachers $T_1$Georgescu-TPAMI-2021 and $T_2$Georgescu-CVPR-2021 with the scores of our student on test video 14 from Avenue. The anomaly localization examples are provided by the head with the highest resolution of our student. Best viewed in color.
Figure 5: Comparing the frame-level anomaly scores of teachers $T_1$Georgescu-TPAMI-2021 and $T_2$Georgescu-CVPR-2021 with the scores of our student on test video 1 from Avenue. The anomaly localization examples are provided by the head with the highest resolution of our student. Best viewed in color.
...and 7 more figures

Lightning Fast Video Anomaly Detection via Adversarial Knowledge Distillation

TL;DR

Abstract

Lightning Fast Video Anomaly Detection via Adversarial Knowledge Distillation

Authors

TL;DR

Abstract

Table of Contents

Figures (12)