Table of Contents
Fetching ...

TACO: Tackling Over-correction in Federated Learning with Tailored Adaptive Correction

Weijie Liu, Ziwei Zhan, Carlee Joe-Wong, Edith Ngai, Jingpu Duan, Deke Guo, Xu Chen, Xiaoxi Zhang

TL;DR

This paper tackles federated learning under non-IID data by unveiling a hidden over-correction phenomenon caused by uniform correction coefficients in existing methods. It introduces TACO, a lightweight algorithm that assigns client-specific correction factors based on local gradient magnitude and direction and uses a tailored aggregation rule, along with freeloaders detection, to steer local models toward the global optimum with minimal overhead. The authors provide a convergence analysis showing how over-correction can harm convergence and demonstrate, across eight datasets and 20–100 clients, that TACO delivers superior round-to-accuracy and time-to-accuracy performance while maintaining robustness to adversarial behavior. The work offers practical improvements for edge FL by balancing convergence, efficiency, and resilience to freeloaders.

Abstract

Non-independent and identically distributed (Non-IID) data across edge clients have long posed significant challenges to federated learning (FL) training in edge computing environments. Prior works have proposed various methods to mitigate this statistical heterogeneity. While these works can achieve good theoretical performance, in this work we provide the first investigation into a hidden over-correction phenomenon brought by the uniform model correction coefficients across clients adopted by existing methods. Such over-correction could degrade model performance and even cause failures in model convergence. To address this, we propose TACO, a novel algorithm that addresses the non-IID nature of clients' data by implementing fine-grained, client-specific gradient correction and model aggregation, steering local models towards a more accurate global optimum. Moreover, we verify that leading FL algorithms generally have better model accuracy in terms of communication rounds rather than wall-clock time, resulting from their extra computation overhead imposed on clients. To enhance the training efficiency, TACO deploys a lightweight model correction and tailored aggregation approach that requires minimum computation overhead and no extra information beyond the synchronized model parameters. To validate TACO's effectiveness, we present the first FL convergence analysis that reveals the root cause of over-correction. Extensive experiments across various datasets confirm TACO's superior and stable performance in practice.

TACO: Tackling Over-correction in Federated Learning with Tailored Adaptive Correction

TL;DR

This paper tackles federated learning under non-IID data by unveiling a hidden over-correction phenomenon caused by uniform correction coefficients in existing methods. It introduces TACO, a lightweight algorithm that assigns client-specific correction factors based on local gradient magnitude and direction and uses a tailored aggregation rule, along with freeloaders detection, to steer local models toward the global optimum with minimal overhead. The authors provide a convergence analysis showing how over-correction can harm convergence and demonstrate, across eight datasets and 20–100 clients, that TACO delivers superior round-to-accuracy and time-to-accuracy performance while maintaining robustness to adversarial behavior. The work offers practical improvements for edge FL by balancing convergence, efficiency, and resilience to freeloaders.

Abstract

Non-independent and identically distributed (Non-IID) data across edge clients have long posed significant challenges to federated learning (FL) training in edge computing environments. Prior works have proposed various methods to mitigate this statistical heterogeneity. While these works can achieve good theoretical performance, in this work we provide the first investigation into a hidden over-correction phenomenon brought by the uniform model correction coefficients across clients adopted by existing methods. Such over-correction could degrade model performance and even cause failures in model convergence. To address this, we propose TACO, a novel algorithm that addresses the non-IID nature of clients' data by implementing fine-grained, client-specific gradient correction and model aggregation, steering local models towards a more accurate global optimum. Moreover, we verify that leading FL algorithms generally have better model accuracy in terms of communication rounds rather than wall-clock time, resulting from their extra computation overhead imposed on clients. To enhance the training efficiency, TACO deploys a lightweight model correction and tailored aggregation approach that requires minimum computation overhead and no extra information beyond the synchronized model parameters. To validate TACO's effectiveness, we present the first FL convergence analysis that reveals the root cause of over-correction. Extensive experiments across various datasets confirm TACO's superior and stable performance in practice.

Paper Structure

This paper contains 17 sections, 5 theorems, 23 equations, 7 figures, 8 tables, 2 algorithms.

Key Result

Lemma 1

$\Delta_t$ can be treated as the exponential moving average of the accumulated local gradients from all clients.

Figures (7)

  • Figure 1: Federated Learning with non-IID data: Uniform correction coefficients versus tailored correction coefficients
  • Figure 2: Round-to-accuracy and time-to-accuracy re-evaluations
  • Figure 3: Clients with larger $\theta_i$ and magnitude of the local gradient $\Delta_i^t$ need larger correction factors $1-\alpha_i^t$.
  • Figure 4: Cumulative local training time required by different algorithms to achieve the target accuracy.
  • Figure 5: Local computation time for clients in every FL round. (The orange bars are the median across all training rounds.)
  • ...and 2 more figures

Theorems & Definitions (12)

  • Remark 1
  • Definition 1
  • Definition 2
  • Lemma 1: Update rule of the aggregated global gradient $\Delta_t$
  • Lemma 2: Update rule of the final model output $\mathbf{z}_t$
  • proof : Proof Sketch
  • Theorem 1: Error bound with over-correction term and tailored correction coefficients $\alpha_i^t$
  • Corollary 1: Convergence rate with tailored correction coefficients $\alpha_i^t$
  • Corollary 2: Optimal correction factors for different clients
  • proof : Proof of Theorem \ref{['thm:1']}
  • ...and 2 more