Table of Contents
Fetching ...

FedRepOpt: Gradient Re-parametrized Optimizers in Federated Learning

Kin Wai Lau, Yasar Abbas Ur Rehman, Pedro Porto Buarque de Gusmão, Lai-Man Po, Lan Ma, Yuyang Xie

TL;DR

This work proposes FedRepOpt, a gradient re-parameterized optimizer for FL, which allows training a simple local model with a similar performance as a complex model by modifying the optimizer's gradients according to a set of model-specific hyperparameters obtained from the complex models.

Abstract

Federated Learning (FL) has emerged as a privacy-preserving method for training machine learning models in a distributed manner on edge devices. However, on-device models face inherent computational power and memory limitations, potentially resulting in constrained gradient updates. As the model's size increases, the frequency of gradient updates on edge devices decreases, ultimately leading to suboptimal training outcomes during any particular FL round. This limits the feasibility of deploying advanced and large-scale models on edge devices, hindering the potential for performance enhancements. To address this issue, we propose FedRepOpt, a gradient re-parameterized optimizer for FL. The gradient re-parameterized method allows training a simple local model with a similar performance as a complex model by modifying the optimizer's gradients according to a set of model-specific hyperparameters obtained from the complex models. In this work, we focus on VGG-style and Ghost-style models in the FL environment. Extensive experiments demonstrate that models using FedRepOpt obtain a significant boost in performance of 16.7% and 11.4% compared to the RepGhost-style and RepVGG-style networks, while also demonstrating a faster convergence time of 11.7% and 57.4% compared to their complex structure.

FedRepOpt: Gradient Re-parametrized Optimizers in Federated Learning

TL;DR

This work proposes FedRepOpt, a gradient re-parameterized optimizer for FL, which allows training a simple local model with a similar performance as a complex model by modifying the optimizer's gradients according to a set of model-specific hyperparameters obtained from the complex models.

Abstract

Federated Learning (FL) has emerged as a privacy-preserving method for training machine learning models in a distributed manner on edge devices. However, on-device models face inherent computational power and memory limitations, potentially resulting in constrained gradient updates. As the model's size increases, the frequency of gradient updates on edge devices decreases, ultimately leading to suboptimal training outcomes during any particular FL round. This limits the feasibility of deploying advanced and large-scale models on edge devices, hindering the potential for performance enhancements. To address this issue, we propose FedRepOpt, a gradient re-parameterized optimizer for FL. The gradient re-parameterized method allows training a simple local model with a similar performance as a complex model by modifying the optimizer's gradients according to a set of model-specific hyperparameters obtained from the complex models. In this work, we focus on VGG-style and Ghost-style models in the FL environment. Extensive experiments demonstrate that models using FedRepOpt obtain a significant boost in performance of 16.7% and 11.4% compared to the RepGhost-style and RepVGG-style networks, while also demonstrating a faster convergence time of 11.7% and 57.4% compared to their complex structure.
Paper Structure (23 sections, 2 equations, 5 figures, 8 tables, 1 algorithm)

This paper contains 23 sections, 2 equations, 5 figures, 8 tables, 1 algorithm.

Figures (5)

  • Figure 1: Overview of federated repoptimizer framework. It comprises six steps training pipeline: (1) Server performs the model-specific hyperparameter search (i.e., $\alpha_{3}$ and $\alpha_{1}$) using the HS dataset from public dataset or server dataset. (2) Convert the parallel branch CSLA structure by a single operator and the equivalent training dynamic constant hyper-parameter called gradient multiplier (GM). (3) Initialize the target training structure with the equivalent CSLA structure. (4) Sending the initialized model and GM to each client (5) During the client training, the gradient is multiplied with a constant scalar GM. After training, the updated models are sent to the server. (6) The server aggregates all the client models to obtain a new global model. These steps repeat until the model converges.
  • Figure 1: Network architecture of the Fed-RepGhost-Tr (Left), Fed-CSLA-Ghost (Middle) and Fed-RepGhost-Inf / FedRepOpt-Ghost (Right) chen2022repghost. SE: Squeeze-and-excitation networks hu2018squeeze. SBlock: Shortcut block chen2022repghost. SL with dotted line: Constant scaling layer ding2023reparameterizing. SL: Trainable scaling layer ding2023reparameterizing. DW-Conv: Depth-wise convolution.
  • Figure 2: Learning behavior of CSLA-based models and their reoptimized versions. (a) Ghost Model IID (b) VGG Model (IID) (c) Ghost Model NIID (d) VGG Model NIID
  • Figure 2: Network architecture of the Fed-RepVGG-Tr (Left), Fed-CSLA-VGG (Middle) and Fed-RepVGG-Inf / FedRepOpt-VGG (Right) ding2023reparameterizing. SL with dotted line: Constant scaling layer. SL: Trainable scaling layer.
  • Figure 3: Learning behavior of CSLA-based models and their rep-optimized versions on cross-device Non-IID setting