FedVSR: Towards Model-Agnostic Federated Learning in Video Super-Resolution
Ali Mollaahmadi Dehaghi, Hossein KhademSohi, Reza Razavi, Steve Drew, Mohammad Moshirpour
TL;DR
FedVSR tackles the privacy and efficiency challenges of training high-fidelity video super-resolution models in federated settings. It is model-agnostic and stateless, introducing a lightweight 3D Discrete Wavelet Transform-based high-frequency loss to preserve temporal-spatial details during local training, along with a loss-aware adaptive aggregation that blends loss-based prioritization with uniform averaging via a Hellinger-distance-guided threshold. Empirical results across multiple VSR models and datasets show consistent improvements in PSNR, SSIM, LPIPS, and VMAF with near-zero additional communication overhead compared to rivals, demonstrating robustness to data heterogeneity and communication failures. These findings establish FedVSR as a practical baseline for privacy-preserving low-level vision in FL, bridging privacy, efficiency, and perceptual quality, and paving the way for future high-resolution, heterogeneous FL-VSR deployments.
Abstract
Video super-resolution (VSR) aims to enhance low-resolution videos by leveraging both spatial and temporal information. While deep learning has led to impressive progress, it typically requires centralized data, which raises privacy concerns. Federated learning (FL) offers a privacy-friendly solution, but general FL frameworks often struggle with low-level vision tasks, resulting in blurry, low-quality outputs. To address this, we introduce FedVSR, the first FL framework specifically designed for VSR. It is model-agnostic and stateless, and introduces a lightweight loss function based on the Discrete Wavelet Transform (DWT) to better preserve high-frequency details during local training. Additionally, a loss-aware aggregation strategy combines both DWT-based and task-specific losses to guide global updates effectively. Extensive experiments across multiple VSR models and datasets show that FedVSR not only improves perceptual video quality (up to +0.89 dB PSNR, +0.0370 SSIM, -0.0347 LPIPS and 4.98 VMAF) but also achieves these gains with close to zero computation and communication overhead compared to its rivals. These results demonstrate FedVSR's potential to bridge the gap between privacy, efficiency, and perceptual quality, setting a new benchmark for federated learning in low-level vision tasks. The code is available at: https://github.com/alimd94/FedVSR
