FedCAda: Adaptive Client-Side Optimization for Accelerated and Stable Federated Learning

Liuzhi Zhou; Yu He; Kun Zhai; Xiang Liu; Sen Liu; Xingjun Ma; Guangnan Ye; Yu-Gang Jiang; Hongfeng Chai

FedCAda: Adaptive Client-Side Optimization for Accelerated and Stable Federated Learning

Liuzhi Zhou, Yu He, Kun Zhai, Xiang Liu, Sen Liu, Xingjun Ma, Guangnan Ye, Yu-Gang Jiang, Hongfeng Chai

TL;DR

This work tackles the challenge of achieving fast convergence without sacrificing stability in federated learning by introducing FedCAda, an Adam-based client-side adaptive optimizer with server-side aggregation of adaptive parameters. FedCAda constrains the correction of the Adam moments $m$ and $v$ on clients through configurable denominator adjustments that are stronger early (when client information is limited) and gradually relax as global information accumulates, while the server aggregates both model weights and adaptive parameters to reduce heterogeneity. Empirical results on CIFAR-10, FashionMNIST, and Shakespeare show that FedCAda outperforms state-of-the-art adaptive FL methods in terms of adaptability, convergence speed, and stability across both cross-silo and cross-device settings, particularly under non-IID data. The study also explores several adjustment functions for $m$ and $v$, with ablation analyses indicating that adding to the denominator effectively stabilizes updates and that certain options yield marginal gains, highlighting the practical impact of design choices on FL robustness and efficiency.

Abstract

Federated learning (FL) has emerged as a prominent approach for collaborative training of machine learning models across distributed clients while preserving data privacy. However, the quest to balance acceleration and stability becomes a significant challenge in FL, especially on the client-side. In this paper, we introduce FedCAda, an innovative federated client adaptive algorithm designed to tackle this challenge. FedCAda leverages the Adam algorithm to adjust the correction process of the first moment estimate $m$ and the second moment estimate $v$ on the client-side and aggregate adaptive algorithm parameters on the server-side, aiming to accelerate convergence speed and communication efficiency while ensuring stability and performance. Additionally, we investigate several algorithms incorporating different adjustment functions. This comparative analysis revealed that due to the limited information contained within client models from other clients during the initial stages of federated learning, more substantial constraints need to be imposed on the parameters of the adaptive algorithm. As federated learning progresses and clients gather more global information, FedCAda gradually diminishes the impact on adaptive parameters. These findings provide insights for enhancing the robustness and efficiency of algorithmic improvements. Through extensive experiments on computer vision (CV) and natural language processing (NLP) datasets, we demonstrate that FedCAda outperforms the state-of-the-art methods in terms of adaptability, convergence, stability, and overall performance. This work contributes to adaptive algorithms for federated learning, encouraging further exploration.

FedCAda: Adaptive Client-Side Optimization for Accelerated and Stable Federated Learning

TL;DR

and

on clients through configurable denominator adjustments that are stronger early (when client information is limited) and gradually relax as global information accumulates, while the server aggregates both model weights and adaptive parameters to reduce heterogeneity. Empirical results on CIFAR-10, FashionMNIST, and Shakespeare show that FedCAda outperforms state-of-the-art adaptive FL methods in terms of adaptability, convergence speed, and stability across both cross-silo and cross-device settings, particularly under non-IID data. The study also explores several adjustment functions for

and

, with ablation analyses indicating that adding to the denominator effectively stabilizes updates and that certain options yield marginal gains, highlighting the practical impact of design choices on FL robustness and efficiency.

Abstract

and the second moment estimate

on the client-side and aggregate adaptive algorithm parameters on the server-side, aiming to accelerate convergence speed and communication efficiency while ensuring stability and performance. Additionally, we investigate several algorithms incorporating different adjustment functions. This comparative analysis revealed that due to the limited information contained within client models from other clients during the initial stages of federated learning, more substantial constraints need to be imposed on the parameters of the adaptive algorithm. As federated learning progresses and clients gather more global information, FedCAda gradually diminishes the impact on adaptive parameters. These findings provide insights for enhancing the robustness and efficiency of algorithmic improvements. Through extensive experiments on computer vision (CV) and natural language processing (NLP) datasets, we demonstrate that FedCAda outperforms the state-of-the-art methods in terms of adaptability, convergence, stability, and overall performance. This work contributes to adaptive algorithms for federated learning, encouraging further exploration.

Paper Structure (16 sections, 7 equations, 6 figures, 3 tables, 2 algorithms)

This paper contains 16 sections, 7 equations, 6 figures, 3 tables, 2 algorithms.

Introduction
Related Work
FedAvg-Based Federated Learning
Adaptive Federated Learning
Method
Problem Statement
FedCAda
Server-side
Client-side
Experiments
Setup
Performance Evaluation
Main Results
Results for Different Model Architectures
Ablation Study
...and 1 more sections

Figures (6)

Figure 1: Overview of the proposed client-side adaptive federated learning (FedCAda). Right: The server takes on two roles: ① Aggregate the model weights from the clients and distribute them to each model for the next round; ② Aggregate the Adam optimizer parameters $m$ and $v$ from the clients, and similarly distribute the aggregated average parameters to each client for the next round. Left: The clients utilize local data for training, wherein, during the parameter update stage of backward, the adjusted-Adam optimizer is used in place of the traditional SGD optimizer in FedAvg to achieve client-side adaptive optimization.
Figure 2: The CKA similarity of the first moment estimate $m$ of the Adam optimizer after 200 training rounds among 10 clients. (CKA outputs a similarity score between 0 and 1, indicating not similar at all to identical)
Figure 3: The curves of different functions used in adjustment functions under $\beta=0.9$ and $T=200$.
Figure 4: The training loss (left) and global model test accuracy (right) curves for FedCAda and other federated learning baselines on training CIFAR-10 with $E=3$ and $T=200$.
Figure 5: The training loss (left) and global model test accuracy (right) curves for FedCAda and other federated learning baselines on training FashionMNIST with $E=3$ and $T=200$.
...and 1 more figures

FedCAda: Adaptive Client-Side Optimization for Accelerated and Stable Federated Learning

TL;DR

Abstract

FedCAda: Adaptive Client-Side Optimization for Accelerated and Stable Federated Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (6)