Table of Contents
Fetching ...

GlanceVAD: Exploring Glance Supervision for Label-efficient Video Anomaly Detection

Huaxin Zhang, Xiang Wang, Xiaohao Xu, Xiaonan Huang, Chuchu Han, Yuehuan Wang, Changxin Gao, Shanjun Zhang, Nong Sang

TL;DR

This paper presents a novel labeling paradigm, termed "glance annotation", to achieve a better balance between anomaly detection accuracy and annotation cost, and proposes a customized GlanceVAD method, that leverages gaussian kernels as the basic unit to compose the temporal anomaly distribution.

Abstract

In recent years, video anomaly detection has been extensively investigated in both unsupervised and weakly supervised settings to alleviate costly temporal labeling. Despite significant progress, these methods still suffer from unsatisfactory results such as numerous false alarms, primarily due to the absence of precise temporal anomaly annotation. In this paper, we present a novel labeling paradigm, termed "glance annotation", to achieve a better balance between anomaly detection accuracy and annotation cost. Specifically, glance annotation is a random frame within each abnormal event, which can be easily accessed and is cost-effective. To assess its effectiveness, we manually annotate the glance annotations for two standard video anomaly detection datasets: UCF-Crime and XD-Violence. Additionally, we propose a customized GlanceVAD method, that leverages gaussian kernels as the basic unit to compose the temporal anomaly distribution, enabling the learning of diverse and robust anomaly representations from the glance annotations. Through comprehensive analysis and experiments, we verify that the proposed labeling paradigm can achieve an excellent trade-off between annotation cost and model performance. Extensive experimental results also demonstrate the effectiveness of our GlanceVAD approach, which significantly outperforms existing advanced unsupervised and weakly supervised methods. Code and annotations will be publicly available at https://github.com/pipixin321/GlanceVAD.

GlanceVAD: Exploring Glance Supervision for Label-efficient Video Anomaly Detection

TL;DR

This paper presents a novel labeling paradigm, termed "glance annotation", to achieve a better balance between anomaly detection accuracy and annotation cost, and proposes a customized GlanceVAD method, that leverages gaussian kernels as the basic unit to compose the temporal anomaly distribution.

Abstract

In recent years, video anomaly detection has been extensively investigated in both unsupervised and weakly supervised settings to alleviate costly temporal labeling. Despite significant progress, these methods still suffer from unsatisfactory results such as numerous false alarms, primarily due to the absence of precise temporal anomaly annotation. In this paper, we present a novel labeling paradigm, termed "glance annotation", to achieve a better balance between anomaly detection accuracy and annotation cost. Specifically, glance annotation is a random frame within each abnormal event, which can be easily accessed and is cost-effective. To assess its effectiveness, we manually annotate the glance annotations for two standard video anomaly detection datasets: UCF-Crime and XD-Violence. Additionally, we propose a customized GlanceVAD method, that leverages gaussian kernels as the basic unit to compose the temporal anomaly distribution, enabling the learning of diverse and robust anomaly representations from the glance annotations. Through comprehensive analysis and experiments, we verify that the proposed labeling paradigm can achieve an excellent trade-off between annotation cost and model performance. Extensive experimental results also demonstrate the effectiveness of our GlanceVAD approach, which significantly outperforms existing advanced unsupervised and weakly supervised methods. Code and annotations will be publicly available at https://github.com/pipixin321/GlanceVAD.
Paper Structure (24 sections, 11 equations, 9 figures, 6 tables, 1 algorithm)

This paper contains 24 sections, 11 equations, 9 figures, 6 tables, 1 algorithm.

Figures (9)

  • Figure 1: Motivation illustration. (a): Given an unlabelled video containing abnormal events, fully supervision annotates all of the anomaly timestamps, weakly supervision provides video-level labels (abnormal/normal), and the proposed glance supervision only requires annotating a single frame for each abnormal event. (b): We show that our proposed glance supervision exhibits excellent label efficiency, achieving leading performance over existing state-of-the-art weakly supervised methods (e.g., UR-DMU URDMU, S3R S3R, MSL MSL, etc.) while keeping labeling costs acceptable.
  • Figure 2: Overall statistical results of glance annotations on the XD-Violence xdviolence and UCF-Crime ucf datasets. The left figure illustrates the frequency statistics of glance annotations within the video, while the right figure depicts the temporal position distribution of glance annotations.
  • Figure 3: Overview of the proposed method. We use a pretrained video feature extraction network,i.e., I3D i3d, to extract snippet-level features from the videos. These features are then input to existing MIL-based methods (e.g., MIL ucf, RFTM rtfm, UR-DMU URDMU) to obtain anomaly scores. We propose Temporal Gaussian Splatting to generate pseudo-labels to supervise the anomaly score of abnormal videos.
  • Figure 4: Parameter analysis of $r_g$ and $\alpha$ on UCF-Crime and XD-Violence.
  • Figure 5: Qualitative comparison of the baseline method (UR-DMN) and our method on UCF-Crime dataset.
  • ...and 4 more figures