Table of Contents
Fetching ...

FedVSR: Towards Model-Agnostic Federated Learning in Video Super-Resolution

Ali Mollaahmadi Dehaghi, Hossein KhademSohi, Reza Razavi, Steve Drew, Mohammad Moshirpour

TL;DR

FedVSR tackles the privacy and efficiency challenges of training high-fidelity video super-resolution models in federated settings. It is model-agnostic and stateless, introducing a lightweight 3D Discrete Wavelet Transform-based high-frequency loss to preserve temporal-spatial details during local training, along with a loss-aware adaptive aggregation that blends loss-based prioritization with uniform averaging via a Hellinger-distance-guided threshold. Empirical results across multiple VSR models and datasets show consistent improvements in PSNR, SSIM, LPIPS, and VMAF with near-zero additional communication overhead compared to rivals, demonstrating robustness to data heterogeneity and communication failures. These findings establish FedVSR as a practical baseline for privacy-preserving low-level vision in FL, bridging privacy, efficiency, and perceptual quality, and paving the way for future high-resolution, heterogeneous FL-VSR deployments.

Abstract

Video super-resolution (VSR) aims to enhance low-resolution videos by leveraging both spatial and temporal information. While deep learning has led to impressive progress, it typically requires centralized data, which raises privacy concerns. Federated learning (FL) offers a privacy-friendly solution, but general FL frameworks often struggle with low-level vision tasks, resulting in blurry, low-quality outputs. To address this, we introduce FedVSR, the first FL framework specifically designed for VSR. It is model-agnostic and stateless, and introduces a lightweight loss function based on the Discrete Wavelet Transform (DWT) to better preserve high-frequency details during local training. Additionally, a loss-aware aggregation strategy combines both DWT-based and task-specific losses to guide global updates effectively. Extensive experiments across multiple VSR models and datasets show that FedVSR not only improves perceptual video quality (up to +0.89 dB PSNR, +0.0370 SSIM, -0.0347 LPIPS and 4.98 VMAF) but also achieves these gains with close to zero computation and communication overhead compared to its rivals. These results demonstrate FedVSR's potential to bridge the gap between privacy, efficiency, and perceptual quality, setting a new benchmark for federated learning in low-level vision tasks. The code is available at: https://github.com/alimd94/FedVSR

FedVSR: Towards Model-Agnostic Federated Learning in Video Super-Resolution

TL;DR

FedVSR tackles the privacy and efficiency challenges of training high-fidelity video super-resolution models in federated settings. It is model-agnostic and stateless, introducing a lightweight 3D Discrete Wavelet Transform-based high-frequency loss to preserve temporal-spatial details during local training, along with a loss-aware adaptive aggregation that blends loss-based prioritization with uniform averaging via a Hellinger-distance-guided threshold. Empirical results across multiple VSR models and datasets show consistent improvements in PSNR, SSIM, LPIPS, and VMAF with near-zero additional communication overhead compared to rivals, demonstrating robustness to data heterogeneity and communication failures. These findings establish FedVSR as a practical baseline for privacy-preserving low-level vision in FL, bridging privacy, efficiency, and perceptual quality, and paving the way for future high-resolution, heterogeneous FL-VSR deployments.

Abstract

Video super-resolution (VSR) aims to enhance low-resolution videos by leveraging both spatial and temporal information. While deep learning has led to impressive progress, it typically requires centralized data, which raises privacy concerns. Federated learning (FL) offers a privacy-friendly solution, but general FL frameworks often struggle with low-level vision tasks, resulting in blurry, low-quality outputs. To address this, we introduce FedVSR, the first FL framework specifically designed for VSR. It is model-agnostic and stateless, and introduces a lightweight loss function based on the Discrete Wavelet Transform (DWT) to better preserve high-frequency details during local training. Additionally, a loss-aware aggregation strategy combines both DWT-based and task-specific losses to guide global updates effectively. Extensive experiments across multiple VSR models and datasets show that FedVSR not only improves perceptual video quality (up to +0.89 dB PSNR, +0.0370 SSIM, -0.0347 LPIPS and 4.98 VMAF) but also achieves these gains with close to zero computation and communication overhead compared to its rivals. These results demonstrate FedVSR's potential to bridge the gap between privacy, efficiency, and perceptual quality, setting a new benchmark for federated learning in low-level vision tasks. The code is available at: https://github.com/alimd94/FedVSR

Paper Structure

This paper contains 38 sections, 16 equations, 6 figures, 7 tables, 1 algorithm.

Figures (6)

  • Figure 1: Comparison of RVRT liang2022recurrent output trained with different federated learning algorithms. General FL methods struggle to reconstruct fine details and textures. PSNR, SSIM, and LPIPS metrics are shown in red, blue, and green respectively.
  • Figure 2: Overview of the proposed FedVSR framework. Each client computes a model-agnostic VSR update augmented with a DWT-based high-frequency loss. Clients also track the average local loss , which is used for loss-aware weighted aggregation at the server. The global model is iteratively refined while maintaining model-agnostic and stateless properties.
  • Figure 3: PSNR across different rounds for various test sets under different settings for VRT liang2024vrt, RVRT liang2022recurrent, and IART xu2024enhancing.
  • Figure 4: FedVSR vs. FedAvg under client upload failures (0–75%), showing FedVSR’s higher robustness.
  • Figure 5: FedVSR vs. FedAvg under client population stress test.
  • ...and 1 more figures