Table of Contents
Fetching ...

Abnormal Event Detection In Videos Using Deep Embedding

Darshan Venkatrayappa

TL;DR

The paper tackles video anomaly detection under unlabeled data by proposing a three-part hybrid framework that fuses depth, motion, and appearance features through a Central-Net inspired fusion block and then applies a one-class hypersphere objective to map normal data toward a hypercenter $c$ (minimizing $||\phi(x)-c||^2$). It first pretrains a convolutional autoencoder on fused features and then finetunes the encoder to align embeddings with $c$, enabling effective anomaly detection by distance to the center. Evaluations on UCSD Ped2, CUHK Avenue, and ShanghaiTech show results competitive with other unsupervised methods, validating the benefits of multi-modal fusion and the hypercenter approach. The work highlights the practical potential of unsupervised, multi-modal embedding learning for scalable surveillance analytics, with future directions including additional modalities like pose and audio and joint training of fusion components.

Abstract

Abnormal event detection or anomaly detection in surveillance videos is currently a challenge because of the diversity of possible events. Due to the lack of anomalous events at training time, anomaly detection requires the design of learning methods without supervision. In this work we propose an unsupervised approach for video anomaly detection with the aim to jointly optimize the objectives of the deep neural network and the anomaly detection task using a hybrid architecture. Initially, a convolutional autoencoder is pre-trained in an unsupervised manner with a fusion of depth, motion and appearance features. In the second step, we utilize the encoder part of the pre-trained autoencoder and extract the embeddings of the fused input. Now, we jointly train/ fine tune the encoder to map the embeddings to a hypercenter. Thus, embeddings of normal data fall near the hypercenter, whereas embeddings of anomalous data fall far away from the hypercenter.

Abnormal Event Detection In Videos Using Deep Embedding

TL;DR

The paper tackles video anomaly detection under unlabeled data by proposing a three-part hybrid framework that fuses depth, motion, and appearance features through a Central-Net inspired fusion block and then applies a one-class hypersphere objective to map normal data toward a hypercenter (minimizing ). It first pretrains a convolutional autoencoder on fused features and then finetunes the encoder to align embeddings with , enabling effective anomaly detection by distance to the center. Evaluations on UCSD Ped2, CUHK Avenue, and ShanghaiTech show results competitive with other unsupervised methods, validating the benefits of multi-modal fusion and the hypercenter approach. The work highlights the practical potential of unsupervised, multi-modal embedding learning for scalable surveillance analytics, with future directions including additional modalities like pose and audio and joint training of fusion components.

Abstract

Abnormal event detection or anomaly detection in surveillance videos is currently a challenge because of the diversity of possible events. Due to the lack of anomalous events at training time, anomaly detection requires the design of learning methods without supervision. In this work we propose an unsupervised approach for video anomaly detection with the aim to jointly optimize the objectives of the deep neural network and the anomaly detection task using a hybrid architecture. Initially, a convolutional autoencoder is pre-trained in an unsupervised manner with a fusion of depth, motion and appearance features. In the second step, we utilize the encoder part of the pre-trained autoencoder and extract the embeddings of the fused input. Now, we jointly train/ fine tune the encoder to map the embeddings to a hypercenter. Thus, embeddings of normal data fall near the hypercenter, whereas embeddings of anomalous data fall far away from the hypercenter.
Paper Structure (10 sections, 3 equations, 4 figures, 2 tables)

This paper contains 10 sections, 3 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Overview of the method
  • Figure 2: Architecture of fusion block
  • Figure 3: Anomaly detection result on Ped2 dataset, Sequence 04. GT denotes the ground truth(Blue curve). The Green curve shows the detection from our approach.
  • Figure 4: Anomaly detection result on Avenue dataset, Sequence 05. GT denotes the ground truth(Blue curve). The Green curve shows the detection from our approach