Table of Contents
Fetching ...

Reviving Stale Updates: Data-Free Knowledge Distillation for Asynchronous Federated Learning

Baris Askin, Holger R. Roth, Zhenyu Sun, Carlee Joe-Wong, Gauri Joshi, Ziyue Xu

TL;DR

This paper tackles the problem of stale updates in asynchronous federated learning (AFL) by introducing FedRevive, a framework that couples parameter-space aggregation with data-free knowledge distillation (DFKD). FedRevive uses a lightweight, server-side meta-learned generator to synthesize pseudo-samples and performs multi-teacher distillation from a buffer of recent client models, blending the KD signal with raw updates via an adaptive weighting that increases with update staleness. Empirical results on vision and text benchmarks show that FedRevive achieves faster convergence and higher final accuracy than baselines, with improvements up to 32.1% in training speed and up to 21.5% in final accuracy in some setups. The method preserves data privacy and scalability while demonstrating that stale client updates contain transferable knowledge that can be effectively transferred without public data, suggesting strong practical potential for large-scale, privacy-preserving AFL deployments.

Abstract

Federated Learning (FL) enables collaborative model training across distributed clients without sharing raw data, yet its scalability is limited by synchronization overhead. Asynchronous Federated Learning (AFL) alleviates this issue by allowing clients to communicate independently, thereby improving wall-clock efficiency in large-scale, heterogeneous environments. However, this asynchrony introduces stale updates (client updates computed on outdated global models) that can destabilize optimization and hinder convergence. We propose FedRevive, an asynchronous FL framework that revives stale updates through data-free knowledge distillation (DFKD). FedRevive integrates parameter-space aggregation with a lightweight, server-side DFKD process that transfers knowledge from stale client models to the current global model without access to real or public data. A meta-learned generator synthesizes pseudo-samples, which enables multi-teacher distillation. A hybrid aggregation scheme that combines raw updates with DFKD updates effectively mitigates staleness while retaining the scalability of AFL. Experiments on various vision and text benchmarks show that FedRevive achieves faster training up to 32.1% and higher final accuracy up to 21.5% compared to asynchronous baselines.

Reviving Stale Updates: Data-Free Knowledge Distillation for Asynchronous Federated Learning

TL;DR

This paper tackles the problem of stale updates in asynchronous federated learning (AFL) by introducing FedRevive, a framework that couples parameter-space aggregation with data-free knowledge distillation (DFKD). FedRevive uses a lightweight, server-side meta-learned generator to synthesize pseudo-samples and performs multi-teacher distillation from a buffer of recent client models, blending the KD signal with raw updates via an adaptive weighting that increases with update staleness. Empirical results on vision and text benchmarks show that FedRevive achieves faster convergence and higher final accuracy than baselines, with improvements up to 32.1% in training speed and up to 21.5% in final accuracy in some setups. The method preserves data privacy and scalability while demonstrating that stale client updates contain transferable knowledge that can be effectively transferred without public data, suggesting strong practical potential for large-scale, privacy-preserving AFL deployments.

Abstract

Federated Learning (FL) enables collaborative model training across distributed clients without sharing raw data, yet its scalability is limited by synchronization overhead. Asynchronous Federated Learning (AFL) alleviates this issue by allowing clients to communicate independently, thereby improving wall-clock efficiency in large-scale, heterogeneous environments. However, this asynchrony introduces stale updates (client updates computed on outdated global models) that can destabilize optimization and hinder convergence. We propose FedRevive, an asynchronous FL framework that revives stale updates through data-free knowledge distillation (DFKD). FedRevive integrates parameter-space aggregation with a lightweight, server-side DFKD process that transfers knowledge from stale client models to the current global model without access to real or public data. A meta-learned generator synthesizes pseudo-samples, which enables multi-teacher distillation. A hybrid aggregation scheme that combines raw updates with DFKD updates effectively mitigates staleness while retaining the scalability of AFL. Experiments on various vision and text benchmarks show that FedRevive achieves faster training up to 32.1% and higher final accuracy up to 21.5% compared to asynchronous baselines.

Paper Structure

This paper contains 35 sections, 12 equations, 16 figures, 3 tables, 1 algorithm.

Figures (16)

  • Figure 1: Distribution of update staleness across server rounds for a fully asynchronous setup. The distribution aligns with real-world AFL systems reported in FedBuff.
  • Figure 2: CIFAR-10 test accuracy over simulated time. FedRevive converges faster and attains higher final accuracy than baseline FedBuff (0.83$\pm$0.01 vs 0.69$\pm$0.01).
  • Figure 3: CIFAR-100 test accuracy vs simulated time. The performance gap between FedRevive (final accuracy: 0.51$\pm$0.01) and FedBuff (final accuracy: 0.46$\pm$0.01) widens when the client models are trained enough to show sufficient teacher performance in this harder 100-class task.
  • Figure 4: FEMNIST accuracy curves. Although the task is simpler than CIFAR-10 and CIFAR-100, leading to similar convergence rates from all asynchronous baselines, FedRevive (final accuracy: 0.78$\pm$0.00) still has faster convergence than FedBuff (final accuracy: 0.75$\pm$0.01).
  • Figure 5: 20NewsGroups test accuracy curves. FedRevive and FedBuff reach the same final accuracy (0.68), with FedRevive surpassing the baseline in convergence speed.
  • ...and 11 more figures