Table of Contents
Fetching ...

Scheduling and Aggregation Design for Asynchronous Federated Learning over Wireless Networks

Chung-Hsuan Hu, Zheng Chen, Erik G. Larsson

TL;DR

This work addresses the straggler and communication bottleneck problem in federated learning over wireless networks by introducing an asynchronous FL framework with periodic aggregation. It combines channel-aware data-importance scheduling and age-aware aggregation to reduce bias and variance in updates, supported by theoretical convergence analysis and MNIST-based simulations showing improved convergence over synchronous FedAvg and fully asynchronous FedAsync, especially under non-iid data. The results provide practical guidelines for wireless FL, including resource allocation, compression strategies, and update-to-update weighting, to achieve faster, more reliable learning with heterogeneous devices. Overall, the design demonstrates that carefully balancing data representativeness, channel quality, and update freshness yields robust performance in resource-constrained FL systems.

Abstract

Federated Learning (FL) is a collaborative machine learning (ML) framework that combines on-device training and server-based aggregation to train a common ML model among distributed agents. In this work, we propose an asynchronous FL design with periodic aggregation to tackle the straggler issue in FL systems. Considering limited wireless communication resources, we investigate the effect of different scheduling policies and aggregation designs on the convergence performance. Driven by the importance of reducing the bias and variance of the aggregated model updates, we propose a scheduling policy that jointly considers the channel quality and training data representation of user devices. The effectiveness of our channel-aware data-importance-based scheduling policy, compared with state-of-the-art methods proposed for synchronous FL, is validated through simulations. Moreover, we show that an ``age-aware'' aggregation weighting design can significantly improve the learning performance in an asynchronous FL setting.

Scheduling and Aggregation Design for Asynchronous Federated Learning over Wireless Networks

TL;DR

This work addresses the straggler and communication bottleneck problem in federated learning over wireless networks by introducing an asynchronous FL framework with periodic aggregation. It combines channel-aware data-importance scheduling and age-aware aggregation to reduce bias and variance in updates, supported by theoretical convergence analysis and MNIST-based simulations showing improved convergence over synchronous FedAvg and fully asynchronous FedAsync, especially under non-iid data. The results provide practical guidelines for wireless FL, including resource allocation, compression strategies, and update-to-update weighting, to achieve faster, more reliable learning with heterogeneous devices. Overall, the design demonstrates that carefully balancing data representativeness, channel quality, and update freshness yields robust performance in resource-constrained FL systems.

Abstract

Federated Learning (FL) is a collaborative machine learning (ML) framework that combines on-device training and server-based aggregation to train a common ML model among distributed agents. In this work, we propose an asynchronous FL design with periodic aggregation to tackle the straggler issue in FL systems. Considering limited wireless communication resources, we investigate the effect of different scheduling policies and aggregation designs on the convergence performance. Driven by the importance of reducing the bias and variance of the aggregated model updates, we propose a scheduling policy that jointly considers the channel quality and training data representation of user devices. The effectiveness of our channel-aware data-importance-based scheduling policy, compared with state-of-the-art methods proposed for synchronous FL, is validated through simulations. Moreover, we show that an ``age-aware'' aggregation weighting design can significantly improve the learning performance in an asynchronous FL setting.
Paper Structure (27 sections, 4 theorems, 67 equations, 10 figures)

This paper contains 27 sections, 4 theorems, 67 equations, 10 figures.

Key Result

Theorem 1

Under Assumptions asump:lSmooth-assump:rmin, $E=1$, constant learning rate $\alpha(t)\triangleq\alpha<\frac{\mu r_{\min}}{d\left(2L^2+C_3\right)}$, and partial device participation such that $\Pi(\rho)=\cup_{j=0,...,a_{\lim}}\mathcal{M}_j(\rho)\subseteq\mathcal{N}, \rho=1,...,t$, it holds that where $C_3=8L^2\left[\left(1+\frac{d}{4\nu^2}\right)C_2+1\right]$ and The expectation is taken over the

Figures (10)

  • Figure 1: The FL process and information exchange between the server and the participating devices.
  • Figure 2: Conceptional difference between synchronous FL, fully asynchronous FL, and our proposed asynchronous FL with periodic aggregation. $\boldsymbol{\theta}(t)$ represents the model parameter vector in the $t$-th global iteration.
  • Figure 3: The relations between $\mathcal{N}$, $\mathcal{K}(t)$, $\Pi(t)$, and $\{\mathcal{M}_m(t)\}_0^{a_{\lim}}, a_{\lim}=3,$ at iteration $t$, where each colored block represents one device, and same-color devices have the same ALU. An example of normalization scaling for each nonempty $\mathcal{M}_m(t)$ is provided.
  • Figure 4: Test accuracy of the proposed scheme under different partial scheduling ratio, where $N=40$ and $n=50000$.
  • Figure 5: Impact of $\tilde{T}$ on test accuracy of the proposed asynchronous FL with periodic aggregation in i.i.d. and non-i.i.d. scenarios, where $N=40$, $R=0.2N$ and $n=300000$.
  • ...and 5 more figures

Theorems & Definitions (9)

  • Definition 1
  • Definition 2
  • Theorem 1
  • Remark 1
  • Remark 2
  • Remark 3
  • Lemma 1
  • Lemma 2
  • Lemma 3