FedVideoMAE: Efficient Privacy-Preserving Federated Video Moderation
Ziyuan Tao, Chuanzhi Xu, Sandaru Jayawardana, Wei Bao, Kanchana Thilakarathna, Teng Joon Lim
TL;DR
This work tackles privacy and bandwidth challenges in cloud-based video moderation by proposing FedVideoMAE, an on-device federated framework that couples self-supervised VideoMAE representations with LoRA-based parameter-efficient adaptation. By freezing the backbone and training only lightweight adapters, it achieves substantial communication savings (28.3×) while supporting differential privacy and secure aggregation. Experiments on RWF-2000 with 40 clients show strong baseline accuracy (77.25%) without DP and a robust 65–66% accuracy under strong privacy (ε ≤ 10), with a principled effective-SNR analysis explaining DP noise behavior in the PEFT regime. The results demonstrate practical, defense-in-depth privacy for on-device video moderation and provide design guidance for future privacy-utility improvements in edge analytics.
Abstract
The rapid growth of short-form video platforms increases the need for privacy-preserving moderation, as cloud-based pipelines expose raw videos to privacy risks, high bandwidth costs, and inference latency. To address these challenges, we propose an on-device federated learning framework for video violence detection that integrates self-supervised VideoMAE representations, LoRA-based parameter-efficient adaptation, and defense-in-depth privacy protection. Our approach reduces the trainable parameter count to 5.5M (~3.5% of a 156M backbone) and incorporates DP-SGD with configurable privacy budgets and secure aggregation. Experiments on RWF-2000 with 40 clients achieve 77.25% accuracy without privacy protection and 65-66% under strong differential privacy, while reducing communication cost by $28.3\times$ compared to full-model federated learning. The code is available at: {https://github.com/zyt-599/FedVideoMAE}
