Recognition of Abnormal Events in Surveillance Videos using Weakly Supervised Dual-Encoder Models

Noam Tsfaty; Avishai Weizman; Liav Cohen; Moshe Tshuva; Yehudit Aperstein

Recognition of Abnormal Events in Surveillance Videos using Weakly Supervised Dual-Encoder Models

Noam Tsfaty, Avishai Weizman, Liav Cohen, Moshe Tshuva, Yehudit Aperstein

TL;DR

The paper tackles surveillance video anomaly detection with weak supervision using only video-level labels. It introduces a dual-encoder MIL framework that fuses spatiotemporal I3D features with TimeSformer transformer representations, processing 32 uniform 16-frame segments per video to produce per-segment scores that are aggregated by top-k pooling. Video-level predictions are trained with binary cross-entropy, achieving an AUC of 90.7% on the UCF-Crime dataset and outperforming a range of baselines. The results demonstrate that combining complementary encoders and weak supervision can yield robust anomaly detection suitable for real-world surveillance applications.

Abstract

We address the challenge of detecting rare and diverse anomalies in surveillance videos using only video-level supervision. Our dual-backbone framework combines convolutional and transformer representations through top-k pooling, achieving 90.7% area under the curve (AUC) on the UCF-Crime dataset.

Recognition of Abnormal Events in Surveillance Videos using Weakly Supervised Dual-Encoder Models

TL;DR

Abstract

Recognition of Abnormal Events in Surveillance Videos using Weakly Supervised Dual-Encoder Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (1)