Gradient Correction in Federated Learning with Adaptive Optimization
Evan Chen, Shiqiang Wang, Jianing Zhang, Dong-Jun Han, Chaoyue Liu, Christopher Brinton
TL;DR
This paper addresses the detrimental effect of data heterogeneity in federated learning when using adaptive optimizers like Adam. It introduces FAdamGC, a gradient-corrected Adam algorithm that injects a pre-estimation drift correction into the local updates and employs gradient-level correction buffers, with selective tracking to reduce communication. The authors provide convergence guarantees for non-convex objectives and show linear speedup under mild assumptions, highlighting improvements over naive SGD-based corrections. Empirical results across image and language tasks demonstrate that FAdamGC achieves faster convergence and better communication-efficiency than baselines, particularly in highly non-i.i.d. settings. The work advances robust, efficient federated optimization by integrating gradient correction with adaptive methods, though it relies on bounded gradient assumptions that future work may relax.
Abstract
In federated learning (FL), model training performance is strongly impacted by data heterogeneity across clients. Client-drift compensation methods have recently emerged as a solution to this issue, introducing correction terms into local model updates. To date, these methods have only been considered under stochastic gradient descent (SGD)-based model training, while modern FL frameworks also employ adaptive optimizers (e.g., Adam) for improved convergence. However, due to the complex interplay between first and second moments found in most adaptive optimization methods, naively injecting correction terms can lead to performance degradation in heterogeneous settings. In this work, we propose {\tt FAdamGC}, the first algorithm to integrate drift compensation into adaptive federated optimization. The key idea of {\tt FAdamGC} is injecting a pre-estimation correction term that aligns with the moment structure of adaptive methods. We provide a rigorous convergence analysis of our algorithm under non-convex settings, showing that {\tt FAdamGC} results in better rate and milder assumptions than naively porting SGD-based correction algorithms into adaptive optimizers. Our experimental results demonstrate that {\tt FAdamGC} consistently outperform existing methods in total communication and computation cost across varying levels of data heterogeneity, showing the efficacy of correcting gradient information in federated adaptive optimization.
