Communication-Efficient Federated Learning under Dynamic Device Arrival and Departure: Convergence Analysis and Algorithm Design
Zhan-Lun Chang, Dong-Jun Han, Seyyedali Hosseinalipour, Mung Chiang, Christopher G. Brinton
TL;DR
This work addresses federated learning under dynamic device arrivals and departures, where the active device set and the optimization target evolve across sessions. It provides a convergence analysis for session-based FL under gradient noise and data heterogeneity, and introduces a plug-and-play dynamic initialization that forms a gradient-similarity weighted average of prior global models to accelerate recovery after distribution shifts. The proposed initialization is designed to be compatible with existing FL algorithms and is validated through simulations showing order-of-magnitude speedups to target accuracy and substantial energy savings across diverse datasets and wireless conditions. Collectively, the approach enables faster, more energy-efficient FL in wireless edge settings, with practical impact for real-world deployments subject to frequent device churn.
Abstract
Most federated learning (FL) approaches assume a fixed device set. However, real-world scenarios often involve devices dynamically joining or leaving the system, driven by, e.g., user mobility patterns or handovers across cell boundaries. This dynamic setting introduces unique challenges: (1) the optimization objective evolves with the active device set, unlike traditional FL's static objective; and (2) the current global model may no longer serve as an effective initialization for subsequent rounds, potentially hindering adaptation, delaying convergence, and reducing resource efficiency. To address these challenges, we first provide a convergence analysis for FL under a dynamic device set, accounting for factors such as gradient noise, local training iterations, and data heterogeneity. Building on this analysis, we propose a model initialization algorithm that enables rapid adaptation whenever devices join or leave the network. Our key idea is to compute a weighted average of previous global models, guided by gradient similarity, to prioritize models trained on data distributions that closely align with the current device set, thereby accelerating recovery from distribution shifts in fewer training rounds. This plug-and-play algorithm is designed to integrate seamlessly with existing FL methods, offering broad applicability. Experiments demonstrate that our approach achieves convergence speedups typically an order of magnitude or more compared to baselines, which we show drastically reduces energy consumption to reach a target accuracy.
