Table of Contents
Fetching ...

Balancing Accuracy and Training Time in Federated Learning for Violence Detection in Surveillance Videos: A Study of Neural Network Architectures

Pajon Quentin, Serre Swan, Wissocq Hugo, Rabaud Léo, Haidar Siba, Yaacoub Antoun

TL;DR

This study addresses privacy-preserving violence detection in surveillance video by leveraging federated learning and a lightweight Diff-Gated architecture that replaces optical flow with frame differences. It combines transfer learning with One-Cycle training to accelerate convergence and employs spatio-temporal features to boost accuracy while reducing computation. The authors establish a federated-learning ready dataset preparation method and demonstrate that Diff-Gated achieves higher accuracy and lower preprocessing/training times than Flow-Gated, including in a FedAvg-based FL setup. The work highlights the practical viability of privacy-conscious, efficient violence detection in real-world CCTV centers, while also acknowledging memory and non-IID data challenges and proposing directions for future FL improvements.

Abstract

This paper presents an investigation into machine learning techniques for violence detection in videos and their adaptation to a federated learning context. The study includes experiments with spatio-temporal features extracted from benchmark video datasets, comparison of different methods, and proposal of a modified version of the "Flow-Gated" architecture called "Diff-Gated." Additionally, various machine learning techniques, including super-convergence and transfer learning, are explored, and a method for adapting centralized datasets to a federated learning context is developed. The research achieves better accuracy results compared to state-of-the-art models by training the best violence detection model in a federated learning context.

Balancing Accuracy and Training Time in Federated Learning for Violence Detection in Surveillance Videos: A Study of Neural Network Architectures

TL;DR

This study addresses privacy-preserving violence detection in surveillance video by leveraging federated learning and a lightweight Diff-Gated architecture that replaces optical flow with frame differences. It combines transfer learning with One-Cycle training to accelerate convergence and employs spatio-temporal features to boost accuracy while reducing computation. The authors establish a federated-learning ready dataset preparation method and demonstrate that Diff-Gated achieves higher accuracy and lower preprocessing/training times than Flow-Gated, including in a FedAvg-based FL setup. The work highlights the practical viability of privacy-conscious, efficient violence detection in real-world CCTV centers, while also acknowledging memory and non-IID data challenges and proposing directions for future FL improvements.

Abstract

This paper presents an investigation into machine learning techniques for violence detection in videos and their adaptation to a federated learning context. The study includes experiments with spatio-temporal features extracted from benchmark video datasets, comparison of different methods, and proposal of a modified version of the "Flow-Gated" architecture called "Diff-Gated." Additionally, various machine learning techniques, including super-convergence and transfer learning, are explored, and a method for adapting centralized datasets to a federated learning context is developed. The research achieves better accuracy results compared to state-of-the-art models by training the best violence detection model in a federated learning context.
Paper Structure (20 sections, 3 figures, 7 tables)

This paper contains 20 sections, 3 figures, 7 tables.

Figures (3)

  • Figure 1: Structure of the feature extraction architecture sernani_deep_2021
  • Figure 2: Structure of the transfer learning architecture sernani_deep_2021
  • Figure 3: Structure of the flow gated architecture cheng_rwf-2000_2020