Table of Contents
Fetching ...

On the Power of Adaptive Weighted Aggregation in Heterogeneous Federated Learning and Beyond

Dun Zeng, Zenglin Xu, Shiyu Liu, Yu Pan, Qifan Wang, Xiaoying Tang

TL;DR

The paper addresses why FedAvg's convergence under heterogeneous clients often defies pessimistic theoretical bounds observed in prior work. It introduces client consensus dynamics and Local Update Diversity (LUD) as practical lenses to understand training dynamics, and proposes FedAWARE, an adaptive weighted aggregation module that minimizes the norm of the aggregated local updates and can plug into existing FL algorithms. The authors prove that, under standard smoothness and unbiasedness assumptions, a decaying consensus measure enables FedAvg to converge with a bound that includes a consensus term; adopting adaptive aggregation reduces this term, yielding faster convergence and improved generalization, with FedAWARE further enlarging LUD to strengthen generalization. Extensive experiments on CIFAR-10/100 and AGNews across multiple architectures demonstrate faster convergence, more stable generalization, and compatibility of FedAWARE as a plug-in, supporting its practical impact in heterogeneous FL deployment.

Abstract

Federated averaging (FedAvg) is the most fundamental algorithm in Federated learning (FL). Previous theoretical results assert that FedAvg convergence and generalization degenerate under heterogeneous clients. However, recent empirical results show that FedAvg can perform well in many real-world heterogeneous tasks. These results reveal an inconsistency between FL theory and practice that is not fully explained. In this paper, we show that common heterogeneity measures contribute to this inconsistency based on rigorous convergence analysis. Furthermore, we introduce a new measure \textit{client consensus dynamics} and prove that \textit{FedAvg can effectively handle client heterogeneity when an appropriate aggregation strategy is used}. Building on this theoretical insight, we present a simple and effective FedAvg variant termed FedAWARE. Extensive experiments on three datasets and two modern neural network architectures demonstrate that FedAWARE ensures faster convergence and better generalization in heterogeneous client settings. Moreover, our results show that FedAWARE can significantly enhance the generalization performance of advanced FL algorithms when used as a plug-in module.

On the Power of Adaptive Weighted Aggregation in Heterogeneous Federated Learning and Beyond

TL;DR

The paper addresses why FedAvg's convergence under heterogeneous clients often defies pessimistic theoretical bounds observed in prior work. It introduces client consensus dynamics and Local Update Diversity (LUD) as practical lenses to understand training dynamics, and proposes FedAWARE, an adaptive weighted aggregation module that minimizes the norm of the aggregated local updates and can plug into existing FL algorithms. The authors prove that, under standard smoothness and unbiasedness assumptions, a decaying consensus measure enables FedAvg to converge with a bound that includes a consensus term; adopting adaptive aggregation reduces this term, yielding faster convergence and improved generalization, with FedAWARE further enlarging LUD to strengthen generalization. Extensive experiments on CIFAR-10/100 and AGNews across multiple architectures demonstrate faster convergence, more stable generalization, and compatibility of FedAWARE as a plug-in, supporting its practical impact in heterogeneous FL deployment.

Abstract

Federated averaging (FedAvg) is the most fundamental algorithm in Federated learning (FL). Previous theoretical results assert that FedAvg convergence and generalization degenerate under heterogeneous clients. However, recent empirical results show that FedAvg can perform well in many real-world heterogeneous tasks. These results reveal an inconsistency between FL theory and practice that is not fully explained. In this paper, we show that common heterogeneity measures contribute to this inconsistency based on rigorous convergence analysis. Furthermore, we introduce a new measure \textit{client consensus dynamics} and prove that \textit{FedAvg can effectively handle client heterogeneity when an appropriate aggregation strategy is used}. Building on this theoretical insight, we present a simple and effective FedAvg variant termed FedAWARE. Extensive experiments on three datasets and two modern neural network architectures demonstrate that FedAWARE ensures faster convergence and better generalization in heterogeneous client settings. Moreover, our results show that FedAWARE can significantly enhance the generalization performance of advanced FL algorithms when used as a plug-in module.
Paper Structure (17 sections, 6 theorems, 52 equations, 8 figures, 2 tables, 1 algorithm)

This paper contains 17 sections, 6 theorems, 52 equations, 8 figures, 2 tables, 1 algorithm.

Key Result

Proposition 3.1

Given a federated optimization method updating the global model by $\boldsymbol{x}^{t+1} = \boldsymbol{x}^t - \eta_g \tilde{\boldsymbol{d}}^t$, where $\tilde{\boldsymbol{d}}^t$ is estimated by the method. We project the $\tilde{\boldsymbol{d}}^t$ to the direction of AWARE $\boldsymbol{d}^t$ by compu

Figures (8)

  • Figure 1: Training dynamics of FedAvg on Non-IID partitioned CIFAR-10 task. We use Dirichlet distribution to allocate clients' data as described in Section \ref{['sec:exp']}. Dir(0.1) indicates the most heterogeneous FL setting.
  • Figure 2: Training dynamics of raw algorithms. The training loss indicates the convergence speed on training datasets, while the test accuracy indicates the generalization stability against heterogeneous clients.
  • Figure 3: Training dynamics of FedAWARE extension on CIFAR-100 setting. The results of CIFAR-10 and AGNews are presented in the Appendix.
  • Figure 4: Ablation study. "FedAWARE-NA" means we only use moving-averaged local updates without adaptive aggregation. "FedAvg-Full" is the result of vanilla FedAvg with full client participation.
  • Figure 5: Visualization of data distribution.
  • ...and 3 more figures

Theorems & Definitions (7)

  • Proposition 3.1: AWARE extension
  • Theorem 4.1
  • Theorem 4.2
  • Corollary 4.1
  • Definition 4.1: Local Update Diversity
  • Lemma A.1: Bounded local updates reddi2020adaptive
  • Lemma A.2: Tuning the stepsize koloskova2020unified