Table of Contents
Fetching ...

Hybrid Architecture for Real-Time Video Anomaly Detection: Integrating Spatial and Temporal Analysis

Fabien Poirier

TL;DR

The work tackles real-time video anomaly detection by fusing spatial object detection with temporal sequence modeling in a modular hybrid architecture built from YOLOv7 and a VGG19+GRU network. It demonstrates two deployment modes—parallel and serial—allowing trade-offs between speed and precision, and investigates preprocessing strategies such as background removal and pose-based representations. Experiments on proprietary data show the parallel configuration yields faster inference with competitive accuracy, while the serial, pose-enhanced pipeline improves detection of human-behavior anomalies at the cost of some environmental cues. The findings highlight the value of adaptable fusion strategies for surveillance and event management, and point to future directions including richer anomaly classes and dataset expansion.

Abstract

In this paper, we propose a new architecture for real-time anomaly detection in video data, inspired by human behavior combining spatial and temporal analyses. This approach uses two distinct models: (i) for temporal analysis, a recurrent convolutional network (CNN + RNN) is employed, associating VGG19 and a GRU to process video sequences; (ii) regarding spatial analysis, it is performed using YOLOv7 to analyze individual images. These two analyses can be carried out either in parallel, with a final prediction that combines the results of both analysis, or in series, where the spatial analysis enriches the data before the temporal analysis. Some experimentations are been made to compare these two architectural configurations with each other, and evaluate the effectiveness of our hybrid approach in video anomaly detection.

Hybrid Architecture for Real-Time Video Anomaly Detection: Integrating Spatial and Temporal Analysis

TL;DR

The work tackles real-time video anomaly detection by fusing spatial object detection with temporal sequence modeling in a modular hybrid architecture built from YOLOv7 and a VGG19+GRU network. It demonstrates two deployment modes—parallel and serial—allowing trade-offs between speed and precision, and investigates preprocessing strategies such as background removal and pose-based representations. Experiments on proprietary data show the parallel configuration yields faster inference with competitive accuracy, while the serial, pose-enhanced pipeline improves detection of human-behavior anomalies at the cost of some environmental cues. The findings highlight the value of adaptable fusion strategies for surveillance and event management, and point to future directions including richer anomaly classes and dataset expansion.

Abstract

In this paper, we propose a new architecture for real-time anomaly detection in video data, inspired by human behavior combining spatial and temporal analyses. This approach uses two distinct models: (i) for temporal analysis, a recurrent convolutional network (CNN + RNN) is employed, associating VGG19 and a GRU to process video sequences; (ii) regarding spatial analysis, it is performed using YOLOv7 to analyze individual images. These two analyses can be carried out either in parallel, with a final prediction that combines the results of both analysis, or in series, where the spatial analysis enriches the data before the temporal analysis. Some experimentations are been made to compare these two architectural configurations with each other, and evaluate the effectiveness of our hybrid approach in video anomaly detection.

Paper Structure

This paper contains 13 sections, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Spatio-temporal Video Analysis Architecture
  • Figure 2: Structure of the Temporal Analysis Component VGG-GRU. The GRU layer has a dropout rate set to 50%, as do all Dense layers, which also have L2 regularization fixed at 0.01.
  • Figure 3: Example of Mask Generated with YOLO
  • Figure 4: Pose Estimation by YOLOv7 with background
  • Figure 5: Pose Estimation by YOLOv7 without background