Table of Contents
Fetching ...

Violence detection in videos using deep recurrent and convolutional neural networks

Abdarahmane Traoré, Moulay A. Akhloufi

TL;DR

This work proposes a deep learning architecture for violence detection, which combines both recurrent neural networks and 2-dimensional convolutional neural networks (2D CNN) and uses optical flow computed using the captured sequences to encode the movements in the scenes.

Abstract

Violence and abnormal behavior detection research have known an increase of interest in recent years, due mainly to a rise in crimes in large cities worldwide. In this work, we propose a deep learning architecture for violence detection which combines both recurrent neural networks (RNNs) and 2-dimensional convolutional neural networks (2D CNN). In addition to video frames, we use optical flow computed using the captured sequences. CNN extracts spatial characteristics in each frame, while RNN extracts temporal characteristics. The use of optical flow allows to encode the movements in the scenes. The proposed approaches reach the same level as the state-of-the-art techniques and sometime surpass them. It was validated on 3 databases achieving good results.

Violence detection in videos using deep recurrent and convolutional neural networks

TL;DR

This work proposes a deep learning architecture for violence detection, which combines both recurrent neural networks and 2-dimensional convolutional neural networks (2D CNN) and uses optical flow computed using the captured sequences to encode the movements in the scenes.

Abstract

Violence and abnormal behavior detection research have known an increase of interest in recent years, due mainly to a rise in crimes in large cities worldwide. In this work, we propose a deep learning architecture for violence detection which combines both recurrent neural networks (RNNs) and 2-dimensional convolutional neural networks (2D CNN). In addition to video frames, we use optical flow computed using the captured sequences. CNN extracts spatial characteristics in each frame, while RNN extracts temporal characteristics. The use of optical flow allows to encode the movements in the scenes. The proposed approaches reach the same level as the state-of-the-art techniques and sometime surpass them. It was validated on 3 databases achieving good results.
Paper Structure (17 sections, 2 equations, 7 figures, 3 tables)

This paper contains 17 sections, 2 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Proposed architecture pipeline
  • Figure 2: MBCONV block of EfficientNet
  • Figure 3: EfficientNetB0 used to capture spatial features
  • Figure 4: PWC-NET architecture
  • Figure 5: Frames from Hockey Dataset, on the left we have the first frame, in the middle we have the second frame, on the right we have the optical flow compute using the first and the second frame.
  • ...and 2 more figures