Tackling Intertwined Data and Device Heterogeneities in Federated Learning with Unlimited Staleness

Haoming Wang; Wei Gao

Tackling Intertwined Data and Device Heterogeneities in Federated Learning with Unlimited Staleness

Haoming Wang, Wei Gao

TL;DR

The paper tackles federated learning under intertwined data and device heterogeneities that cause unlimited staleness in client updates. Its core idea is a server-side gradient inversion (GI) pipeline that estimates a local data distribution $D_{rec}$ from stale updates $w_i^{t- au}$ to generate unstale updates $\inom{\hat{w}_i^t}{}$ for aggregation, with a late-stage switch back to vanilla FL to minimize GI error. The approach preserves client privacy, incurs no extra client computation or data, and employs sparsification to keep GI computation tractable. Empirical results on MNIST, CIFAR-10, and MDI show up to 25% accuracy improvement and up to 35% fewer training epochs compared with baselines, across both fixed and variant data distributions. The work enables robust, efficient FL in realistic heterogeneous environments and highlights practical scalability and privacy advantages.

Abstract

Federated Learning (FL) can be affected by data and device heterogeneities, caused by clients' different local data distributions and latencies in uploading model updates (i.e., staleness). Traditional schemes consider these heterogeneities as two separate and independent aspects, but this assumption is unrealistic in practical FL scenarios where these heterogeneities are intertwined. In these cases, traditional FL schemes are ineffective, and a better approach is to convert a stale model update into a unstale one. In this paper, we present a new FL framework that ensures the accuracy and computational efficiency of this conversion, hence effectively tackling the intertwined heterogeneities that may cause unlimited staleness in model updates. Our basic idea is to estimate the distributions of clients' local training data from their uploaded stale model updates, and use these estimations to compute unstale client model updates. In this way, our approach does not require any auxiliary dataset nor the clients' local models to be fully trained, and does not incur any additional computation or communication overhead at client devices. We compared our approach with the existing FL strategies on mainstream datasets and models, and showed that our approach can improve the trained model accuracy by up to 25% and reduce the number of required training epochs by up to 35%. Source codes can be found at: https://github.com/pittisl/FL-with-intertwined-heterogeneity.

Tackling Intertwined Data and Device Heterogeneities in Federated Learning with Unlimited Staleness

TL;DR

from stale updates

to generate unstale updates

for aggregation, with a late-stage switch back to vanilla FL to minimize GI error. The approach preserves client privacy, incurs no extra client computation or data, and employs sparsification to keep GI computation tractable. Empirical results on MNIST, CIFAR-10, and MDI show up to 25% accuracy improvement and up to 35% fewer training epochs compared with baselines, across both fixed and variant data distributions. The work enables robust, efficient FL in realistic heterogeneous environments and highlights practical scalability and privacy advantages.

Abstract

Paper Structure (25 sections, 8 equations, 18 figures, 21 tables)

This paper contains 25 sections, 8 equations, 18 figures, 21 tables.

Introduction
Background and Motivation
Tackling Intertwined Heterogeneities in FL
Gradient Inversion
Method
Estimating Local Data Distributions from Stale Model Updates
Switching back to Vanilla FL in Later Stages of FL
Computationally Efficient Gradient Inversion
Protecting Clients' Data Privacy
Experiments
Experiment Setup
FL Performance in the Fixed Data Scenario
FL Performance in the Variant Data Scenario
Related Work
Conclusion
...and 10 more sections

Figures (18)

Figure 1: The impact of staleness in FL
Figure 2: Our proposed method of tackling intertwined data and device heterogeneities in FL
Figure 3: Visualization the loss surface and gradient computed using $D_{rec}$, $D_i$, and random noise data
Figure 4: Our method of gradient inversion based estimation has smaller error compared to that of first-order estimation
Figure 5: Comparison of model updates' estimation error as the FL training progresses
...and 13 more figures

Tackling Intertwined Data and Device Heterogeneities in Federated Learning with Unlimited Staleness

TL;DR

Abstract

Tackling Intertwined Data and Device Heterogeneities in Federated Learning with Unlimited Staleness

Authors

TL;DR

Abstract

Table of Contents

Figures (18)