TACO: Tackling Over-correction in Federated Learning with Tailored Adaptive Correction
Weijie Liu, Ziwei Zhan, Carlee Joe-Wong, Edith Ngai, Jingpu Duan, Deke Guo, Xu Chen, Xiaoxi Zhang
TL;DR
This paper tackles federated learning under non-IID data by unveiling a hidden over-correction phenomenon caused by uniform correction coefficients in existing methods. It introduces TACO, a lightweight algorithm that assigns client-specific correction factors based on local gradient magnitude and direction and uses a tailored aggregation rule, along with freeloaders detection, to steer local models toward the global optimum with minimal overhead. The authors provide a convergence analysis showing how over-correction can harm convergence and demonstrate, across eight datasets and 20–100 clients, that TACO delivers superior round-to-accuracy and time-to-accuracy performance while maintaining robustness to adversarial behavior. The work offers practical improvements for edge FL by balancing convergence, efficiency, and resilience to freeloaders.
Abstract
Non-independent and identically distributed (Non-IID) data across edge clients have long posed significant challenges to federated learning (FL) training in edge computing environments. Prior works have proposed various methods to mitigate this statistical heterogeneity. While these works can achieve good theoretical performance, in this work we provide the first investigation into a hidden over-correction phenomenon brought by the uniform model correction coefficients across clients adopted by existing methods. Such over-correction could degrade model performance and even cause failures in model convergence. To address this, we propose TACO, a novel algorithm that addresses the non-IID nature of clients' data by implementing fine-grained, client-specific gradient correction and model aggregation, steering local models towards a more accurate global optimum. Moreover, we verify that leading FL algorithms generally have better model accuracy in terms of communication rounds rather than wall-clock time, resulting from their extra computation overhead imposed on clients. To enhance the training efficiency, TACO deploys a lightweight model correction and tailored aggregation approach that requires minimum computation overhead and no extra information beyond the synchronized model parameters. To validate TACO's effectiveness, we present the first FL convergence analysis that reveals the root cause of over-correction. Extensive experiments across various datasets confirm TACO's superior and stable performance in practice.
