Reviving Stale Updates: Data-Free Knowledge Distillation for Asynchronous Federated Learning
Baris Askin, Holger R. Roth, Zhenyu Sun, Carlee Joe-Wong, Gauri Joshi, Ziyue Xu
TL;DR
This paper tackles the problem of stale updates in asynchronous federated learning (AFL) by introducing FedRevive, a framework that couples parameter-space aggregation with data-free knowledge distillation (DFKD). FedRevive uses a lightweight, server-side meta-learned generator to synthesize pseudo-samples and performs multi-teacher distillation from a buffer of recent client models, blending the KD signal with raw updates via an adaptive weighting that increases with update staleness. Empirical results on vision and text benchmarks show that FedRevive achieves faster convergence and higher final accuracy than baselines, with improvements up to 32.1% in training speed and up to 21.5% in final accuracy in some setups. The method preserves data privacy and scalability while demonstrating that stale client updates contain transferable knowledge that can be effectively transferred without public data, suggesting strong practical potential for large-scale, privacy-preserving AFL deployments.
Abstract
Federated Learning (FL) enables collaborative model training across distributed clients without sharing raw data, yet its scalability is limited by synchronization overhead. Asynchronous Federated Learning (AFL) alleviates this issue by allowing clients to communicate independently, thereby improving wall-clock efficiency in large-scale, heterogeneous environments. However, this asynchrony introduces stale updates (client updates computed on outdated global models) that can destabilize optimization and hinder convergence. We propose FedRevive, an asynchronous FL framework that revives stale updates through data-free knowledge distillation (DFKD). FedRevive integrates parameter-space aggregation with a lightweight, server-side DFKD process that transfers knowledge from stale client models to the current global model without access to real or public data. A meta-learned generator synthesizes pseudo-samples, which enables multi-teacher distillation. A hybrid aggregation scheme that combines raw updates with DFKD updates effectively mitigates staleness while retaining the scalability of AFL. Experiments on various vision and text benchmarks show that FedRevive achieves faster training up to 32.1% and higher final accuracy up to 21.5% compared to asynchronous baselines.
