FedCAda: Adaptive Client-Side Optimization for Accelerated and Stable Federated Learning
Liuzhi Zhou, Yu He, Kun Zhai, Xiang Liu, Sen Liu, Xingjun Ma, Guangnan Ye, Yu-Gang Jiang, Hongfeng Chai
TL;DR
This work tackles the challenge of achieving fast convergence without sacrificing stability in federated learning by introducing FedCAda, an Adam-based client-side adaptive optimizer with server-side aggregation of adaptive parameters. FedCAda constrains the correction of the Adam moments $m$ and $v$ on clients through configurable denominator adjustments that are stronger early (when client information is limited) and gradually relax as global information accumulates, while the server aggregates both model weights and adaptive parameters to reduce heterogeneity. Empirical results on CIFAR-10, FashionMNIST, and Shakespeare show that FedCAda outperforms state-of-the-art adaptive FL methods in terms of adaptability, convergence speed, and stability across both cross-silo and cross-device settings, particularly under non-IID data. The study also explores several adjustment functions for $m$ and $v$, with ablation analyses indicating that adding to the denominator effectively stabilizes updates and that certain options yield marginal gains, highlighting the practical impact of design choices on FL robustness and efficiency.
Abstract
Federated learning (FL) has emerged as a prominent approach for collaborative training of machine learning models across distributed clients while preserving data privacy. However, the quest to balance acceleration and stability becomes a significant challenge in FL, especially on the client-side. In this paper, we introduce FedCAda, an innovative federated client adaptive algorithm designed to tackle this challenge. FedCAda leverages the Adam algorithm to adjust the correction process of the first moment estimate $m$ and the second moment estimate $v$ on the client-side and aggregate adaptive algorithm parameters on the server-side, aiming to accelerate convergence speed and communication efficiency while ensuring stability and performance. Additionally, we investigate several algorithms incorporating different adjustment functions. This comparative analysis revealed that due to the limited information contained within client models from other clients during the initial stages of federated learning, more substantial constraints need to be imposed on the parameters of the adaptive algorithm. As federated learning progresses and clients gather more global information, FedCAda gradually diminishes the impact on adaptive parameters. These findings provide insights for enhancing the robustness and efficiency of algorithmic improvements. Through extensive experiments on computer vision (CV) and natural language processing (NLP) datasets, we demonstrate that FedCAda outperforms the state-of-the-art methods in terms of adaptability, convergence, stability, and overall performance. This work contributes to adaptive algorithms for federated learning, encouraging further exploration.
