Table of Contents
Fetching ...

FedVideoMAE: Efficient Privacy-Preserving Federated Video Moderation

Ziyuan Tao, Chuanzhi Xu, Sandaru Jayawardana, Wei Bao, Kanchana Thilakarathna, Teng Joon Lim

TL;DR

This work tackles privacy and bandwidth challenges in cloud-based video moderation by proposing FedVideoMAE, an on-device federated framework that couples self-supervised VideoMAE representations with LoRA-based parameter-efficient adaptation. By freezing the backbone and training only lightweight adapters, it achieves substantial communication savings (28.3×) while supporting differential privacy and secure aggregation. Experiments on RWF-2000 with 40 clients show strong baseline accuracy (77.25%) without DP and a robust 65–66% accuracy under strong privacy (ε ≤ 10), with a principled effective-SNR analysis explaining DP noise behavior in the PEFT regime. The results demonstrate practical, defense-in-depth privacy for on-device video moderation and provide design guidance for future privacy-utility improvements in edge analytics.

Abstract

The rapid growth of short-form video platforms increases the need for privacy-preserving moderation, as cloud-based pipelines expose raw videos to privacy risks, high bandwidth costs, and inference latency. To address these challenges, we propose an on-device federated learning framework for video violence detection that integrates self-supervised VideoMAE representations, LoRA-based parameter-efficient adaptation, and defense-in-depth privacy protection. Our approach reduces the trainable parameter count to 5.5M (~3.5% of a 156M backbone) and incorporates DP-SGD with configurable privacy budgets and secure aggregation. Experiments on RWF-2000 with 40 clients achieve 77.25% accuracy without privacy protection and 65-66% under strong differential privacy, while reducing communication cost by $28.3\times$ compared to full-model federated learning. The code is available at: {https://github.com/zyt-599/FedVideoMAE}

FedVideoMAE: Efficient Privacy-Preserving Federated Video Moderation

TL;DR

This work tackles privacy and bandwidth challenges in cloud-based video moderation by proposing FedVideoMAE, an on-device federated framework that couples self-supervised VideoMAE representations with LoRA-based parameter-efficient adaptation. By freezing the backbone and training only lightweight adapters, it achieves substantial communication savings (28.3×) while supporting differential privacy and secure aggregation. Experiments on RWF-2000 with 40 clients show strong baseline accuracy (77.25%) without DP and a robust 65–66% accuracy under strong privacy (ε ≤ 10), with a principled effective-SNR analysis explaining DP noise behavior in the PEFT regime. The results demonstrate practical, defense-in-depth privacy for on-device video moderation and provide design guidance for future privacy-utility improvements in edge analytics.

Abstract

The rapid growth of short-form video platforms increases the need for privacy-preserving moderation, as cloud-based pipelines expose raw videos to privacy risks, high bandwidth costs, and inference latency. To address these challenges, we propose an on-device federated learning framework for video violence detection that integrates self-supervised VideoMAE representations, LoRA-based parameter-efficient adaptation, and defense-in-depth privacy protection. Our approach reduces the trainable parameter count to 5.5M (~3.5% of a 156M backbone) and incorporates DP-SGD with configurable privacy budgets and secure aggregation. Experiments on RWF-2000 with 40 clients achieve 77.25% accuracy without privacy protection and 65-66% under strong differential privacy, while reducing communication cost by compared to full-model federated learning. The code is available at: {https://github.com/zyt-599/FedVideoMAE}

Paper Structure

This paper contains 15 sections, 5 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Overview of three deployment paradigms: centralized cloud moderation that uploads raw videos and is vulnerable to man-in-the-middle or server-side misuse, basic federated learning without privacy where model updates can leak content, and our DP- and SA-protected federated pipeline for video violence detection.
  • Figure 2: High-level architecture of the proposed privacy-preserving federated learning pipeline for video violence detection. The system combines on-device VideoMAE-based client training with server-side secure aggregation under differential privacy.
  • Figure 3: Qualitative visualization of violence detection results under non-private and DP-protected federated training on RWF-2000.