Balancing Accuracy and Training Time in Federated Learning for Violence Detection in Surveillance Videos: A Study of Neural Network Architectures

Pajon Quentin; Serre Swan; Wissocq Hugo; Rabaud Léo; Haidar Siba; Yaacoub Antoun

Balancing Accuracy and Training Time in Federated Learning for Violence Detection in Surveillance Videos: A Study of Neural Network Architectures

Pajon Quentin, Serre Swan, Wissocq Hugo, Rabaud Léo, Haidar Siba, Yaacoub Antoun

TL;DR

This study addresses privacy-preserving violence detection in surveillance video by leveraging federated learning and a lightweight Diff-Gated architecture that replaces optical flow with frame differences. It combines transfer learning with One-Cycle training to accelerate convergence and employs spatio-temporal features to boost accuracy while reducing computation. The authors establish a federated-learning ready dataset preparation method and demonstrate that Diff-Gated achieves higher accuracy and lower preprocessing/training times than Flow-Gated, including in a FedAvg-based FL setup. The work highlights the practical viability of privacy-conscious, efficient violence detection in real-world CCTV centers, while also acknowledging memory and non-IID data challenges and proposing directions for future FL improvements.

Abstract

This paper presents an investigation into machine learning techniques for violence detection in videos and their adaptation to a federated learning context. The study includes experiments with spatio-temporal features extracted from benchmark video datasets, comparison of different methods, and proposal of a modified version of the "Flow-Gated" architecture called "Diff-Gated." Additionally, various machine learning techniques, including super-convergence and transfer learning, are explored, and a method for adapting centralized datasets to a federated learning context is developed. The research achieves better accuracy results compared to state-of-the-art models by training the best violence detection model in a federated learning context.

Balancing Accuracy and Training Time in Federated Learning for Violence Detection in Surveillance Videos: A Study of Neural Network Architectures

TL;DR

Abstract

Paper Structure (20 sections, 3 figures, 7 tables)

This paper contains 20 sections, 3 figures, 7 tables.

Introduction
Related work
Violence Detection
Federated Learning
Methodology
Limitation of Classical Classifiers
Transfer Learning and Early Stopping
Transfer Learning and One-Cycle Training
Multi-Channel Input Models using Optical Flow
Multi-Channel Input Models using Frame Differences
Experiment
Dataset Selection
Experimental Setup: Hardware and Software
Data Preparation for Federated Learning: Adapting Traditional Datasets
Federated Learning with Previously Tested Models
...and 5 more sections

Figures (3)

Figure 1: Structure of the feature extraction architecture sernani_deep_2021
Figure 2: Structure of the transfer learning architecture sernani_deep_2021
Figure 3: Structure of the flow gated architecture cheng_rwf-2000_2020

Balancing Accuracy and Training Time in Federated Learning for Violence Detection in Surveillance Videos: A Study of Neural Network Architectures

TL;DR

Abstract

Balancing Accuracy and Training Time in Federated Learning for Violence Detection in Surveillance Videos: A Study of Neural Network Architectures

Authors

TL;DR

Abstract

Table of Contents

Figures (3)