Table of Contents
Fetching ...

Communication-Efficient Federated Learning under Dynamic Device Arrival and Departure: Convergence Analysis and Algorithm Design

Zhan-Lun Chang, Dong-Jun Han, Seyyedali Hosseinalipour, Mung Chiang, Christopher G. Brinton

TL;DR

This work addresses federated learning under dynamic device arrivals and departures, where the active device set and the optimization target evolve across sessions. It provides a convergence analysis for session-based FL under gradient noise and data heterogeneity, and introduces a plug-and-play dynamic initialization that forms a gradient-similarity weighted average of prior global models to accelerate recovery after distribution shifts. The proposed initialization is designed to be compatible with existing FL algorithms and is validated through simulations showing order-of-magnitude speedups to target accuracy and substantial energy savings across diverse datasets and wireless conditions. Collectively, the approach enables faster, more energy-efficient FL in wireless edge settings, with practical impact for real-world deployments subject to frequent device churn.

Abstract

Most federated learning (FL) approaches assume a fixed device set. However, real-world scenarios often involve devices dynamically joining or leaving the system, driven by, e.g., user mobility patterns or handovers across cell boundaries. This dynamic setting introduces unique challenges: (1) the optimization objective evolves with the active device set, unlike traditional FL's static objective; and (2) the current global model may no longer serve as an effective initialization for subsequent rounds, potentially hindering adaptation, delaying convergence, and reducing resource efficiency. To address these challenges, we first provide a convergence analysis for FL under a dynamic device set, accounting for factors such as gradient noise, local training iterations, and data heterogeneity. Building on this analysis, we propose a model initialization algorithm that enables rapid adaptation whenever devices join or leave the network. Our key idea is to compute a weighted average of previous global models, guided by gradient similarity, to prioritize models trained on data distributions that closely align with the current device set, thereby accelerating recovery from distribution shifts in fewer training rounds. This plug-and-play algorithm is designed to integrate seamlessly with existing FL methods, offering broad applicability. Experiments demonstrate that our approach achieves convergence speedups typically an order of magnitude or more compared to baselines, which we show drastically reduces energy consumption to reach a target accuracy.

Communication-Efficient Federated Learning under Dynamic Device Arrival and Departure: Convergence Analysis and Algorithm Design

TL;DR

This work addresses federated learning under dynamic device arrivals and departures, where the active device set and the optimization target evolve across sessions. It provides a convergence analysis for session-based FL under gradient noise and data heterogeneity, and introduces a plug-and-play dynamic initialization that forms a gradient-similarity weighted average of prior global models to accelerate recovery after distribution shifts. The proposed initialization is designed to be compatible with existing FL algorithms and is validated through simulations showing order-of-magnitude speedups to target accuracy and substantial energy savings across diverse datasets and wireless conditions. Collectively, the approach enables faster, more energy-efficient FL in wireless edge settings, with practical impact for real-world deployments subject to frequent device churn.

Abstract

Most federated learning (FL) approaches assume a fixed device set. However, real-world scenarios often involve devices dynamically joining or leaving the system, driven by, e.g., user mobility patterns or handovers across cell boundaries. This dynamic setting introduces unique challenges: (1) the optimization objective evolves with the active device set, unlike traditional FL's static objective; and (2) the current global model may no longer serve as an effective initialization for subsequent rounds, potentially hindering adaptation, delaying convergence, and reducing resource efficiency. To address these challenges, we first provide a convergence analysis for FL under a dynamic device set, accounting for factors such as gradient noise, local training iterations, and data heterogeneity. Building on this analysis, we propose a model initialization algorithm that enables rapid adaptation whenever devices join or leave the network. Our key idea is to compute a weighted average of previous global models, guided by gradient similarity, to prioritize models trained on data distributions that closely align with the current device set, thereby accelerating recovery from distribution shifts in fewer training rounds. This plug-and-play algorithm is designed to integrate seamlessly with existing FL methods, offering broad applicability. Experiments demonstrate that our approach achieves convergence speedups typically an order of magnitude or more compared to baselines, which we show drastically reduces energy consumption to reach a target accuracy.
Paper Structure (31 sections, 1 theorem, 76 equations, 6 figures, 12 tables, 1 algorithm)

This paper contains 31 sections, 1 theorem, 76 equations, 6 figures, 12 tables, 1 algorithm.

Key Result

Theorem 1

Suppose each device performs local SGD updates and the server aggregates via weighted averaging. Define $e_{\text{max}}^{(s,t)} = \max_{k \in \mathcal{K}^{(s)}} e_k^{(s,t)}$ and $e_{\text{min}}^{(s,t)} = \min_{k \in \mathcal{K}^{(s)}} e_k^{(s,t)}$. Assuming the learning rate $\eta^{(s,t)}$ satisfies $\!\!\!$ where $\Lambda^{(s,t)}<1$ is a constant, then for any pattern of device arrivals and depar

Figures (6)

  • Figure 1: Comparison illustrating our proposed dynamic FL framework as a generalization of traditional FL. Traditional FL represents the single-session special case, limited to a fixed device set and a static optimization goal. Our framework generalizes this structure to accommodate dynamic device arrivals and departures across multiple sessions, allowing the optimization goal to evolve and minimize the loss for the currently active set of users.
  • Figure 2: Illustration of Algorithm \ref{['alg:dynamic_initial_model_construction']} when the number of total sessions $S= 4$ and the number of pilot preparation sessions $P =1$.
  • Figure 3: Comparative performance analysis of the proposed algorithm against baseline FL methods across diverse datasets. The results validate that our scheme attains target accuracy with significantly fewer communication rounds, thereby effectively reducing communication costs under dynamic scenarios of device arrivals and departures.
  • Figure 4: Impact of system bandwidth on Total Latency (left) and Total Energy Consumption (right) required to reach 97% of the proposed scheme's peak accuracy on the SVHN dataset. The proposed method consistently achieves the lowest latency and energy consumption across all bandwidth configurations (50--200 MHz). Insets highlight the performance gap between the Proposed and Previous schemes at a finer scale.
  • Figure 5: Performance comparison of the proposed algorithm implemented with four FL algorithms under "Two-Shard" and "Partial-Overlap" label distributions across selected datasets. The results demonstrate the robustness of our proposed scheme to dynamic data distributions caused by client arrivals and departures.
  • ...and 1 more figures

Theorems & Definitions (4)

  • Definition 1: Local Data Variability
  • Theorem 1: Upper Bound on the Gradient Norm
  • proof
  • Remark 1: Practical Utility and Integration