Table of Contents
Fetching ...

ProgFed: Effective, Communication, and Computation Efficient Federated Learning by Progressive Training

Hui-Po Wang, Sebastian U. Stich, Yang He, Mario Fritz

TL;DR

ProgFed is proposed, the first progressive training framework for efficient and effective federated learning that inherently reduces computation and two-way communication costs while maintaining the strong performance of the final models.

Abstract

Federated learning is a powerful distributed learning scheme that allows numerous edge devices to collaboratively train a model without sharing their data. However, training is resource-intensive for edge devices, and limited network bandwidth is often the main bottleneck. Prior work often overcomes the constraints by condensing the models or messages into compact formats, e.g., by gradient compression or distillation. In contrast, we propose ProgFed, the first progressive training framework for efficient and effective federated learning. It inherently reduces computation and two-way communication costs while maintaining the strong performance of the final models. We theoretically prove that ProgFed converges at the same asymptotic rate as standard training on full models. Extensive results on a broad range of architectures, including CNNs (VGG, ResNet, ConvNets) and U-nets, and diverse tasks from simple classification to medical image segmentation show that our highly effective training approach saves up to $20\%$ computation and up to $63\%$ communication costs for converged models. As our approach is also complimentary to prior work on compression, we can achieve a wide range of trade-offs by combining these techniques, showing reduced communication of up to $50\times$ at only $0.1\%$ loss in utility. Code is available at https://github.com/hui-po-wang/ProgFed.

ProgFed: Effective, Communication, and Computation Efficient Federated Learning by Progressive Training

TL;DR

ProgFed is proposed, the first progressive training framework for efficient and effective federated learning that inherently reduces computation and two-way communication costs while maintaining the strong performance of the final models.

Abstract

Federated learning is a powerful distributed learning scheme that allows numerous edge devices to collaboratively train a model without sharing their data. However, training is resource-intensive for edge devices, and limited network bandwidth is often the main bottleneck. Prior work often overcomes the constraints by condensing the models or messages into compact formats, e.g., by gradient compression or distillation. In contrast, we propose ProgFed, the first progressive training framework for efficient and effective federated learning. It inherently reduces computation and two-way communication costs while maintaining the strong performance of the final models. We theoretically prove that ProgFed converges at the same asymptotic rate as standard training on full models. Extensive results on a broad range of architectures, including CNNs (VGG, ResNet, ConvNets) and U-nets, and diverse tasks from simple classification to medical image segmentation show that our highly effective training approach saves up to computation and up to communication costs for converged models. As our approach is also complimentary to prior work on compression, we can achieve a wide range of trade-offs by combining these techniques, showing reduced communication of up to at only loss in utility. Code is available at https://github.com/hui-po-wang/ProgFed.

Paper Structure

This paper contains 24 sections, 2 theorems, 11 equations, 18 figures, 14 tables, 1 algorithm.

Key Result

Theorem 3.3

Let Assumptions asmp_lsmooth and asmp_msbound hold, and let the stepsize in iteration $t$ be $\gamma_t=\alpha_t\gamma$ with $\gamma=\min\{\frac{1}{L}, (\frac{F_0}{\sigma^2T})^\frac{1}{2}\}$, $\alpha_t=$$\min\{1, \frac{\langle \nabla f(\mathbf{x}_t)_{\mid E_s}, \nabla f^s(\mathbf{x}_t^s)_{\mid E_s}\r where $F_0 \vcentcolon=f(\mathbf{x}_0)-(\min_\mathbf{x} f(\mathbf{x}))$.

Figures (18)

  • Figure 1: An overview of ProgFed on (a) feed-forward networks and (b) U-nets (symmetric growing illustrated). We progressively train a deep neural network from the shallower sub-models, e.g. $\mathcal{M}^1$ consisting of the main block $E_1$ and head $G_1$ (Eq. \ref{['eq:def_submodel']}), gradually expanding to the full model $\mathcal{M}^S=\mathcal{M}$ (Eq. \ref{['eq:def_model']}). Note that the local heads $G_i$ in feed-forward networks are only used for training sub-models and discarded when progressing to the next stage.
  • Figure 2: Accuracy (%) vs. GFLOPs on CIFAR-100 in the centralized setting.
  • Figure 3: Computation cost reduction at $98\%$, $99\%$, $99.95\%$, $\textit{best}$ compared to the baseline (training full models) performance in the centralized setting on CIFAR-100.
  • Figure 4: Communication cost reduction at $98\%$, $99\%$, $99.95\%$, $\textit{best}$ compared to the baseline performance in the federated setting.
  • Figure 5: Communication cost vs. Accuracy (%) in federated settings on EMNIST (3400 clients, non-IID), CIFAR-10 (100 clients, IID), CIFAR-100 (500 clients, non-IID), and BraTS (10 clients, IID).
  • ...and 13 more figures

Theorems & Definitions (4)

  • Theorem 3.3
  • Lemma 1.1
  • proof
  • proof