Table of Contents
Fetching ...

FLARE: A New Federated Learning Framework with Adjustable Learning Rates over Resource-Constrained Wireless Networks

Bingnan Xiao, Jingjing Zhang, Wei Ni, Xin Wang

TL;DR

FLARE addresses data, device, and channel heterogeneity in wireless federated learning by enabling per-device learning-rate adjustments and variable local iterations. The authors derive a convergence upper bound for non-convex, non-i.i.d. settings and design a nested scheduling framework that alternates bandwidth allocation (via binary search) and device selection (via greedy methods), with a linear-programming option when Lipschitz constants are large. The approach yields a convergence rate of $O\left(\frac{1}{\sqrt{\bar{\tau} M R}}\right)$ under suitable parameter choices and demonstrates faster, more accurate training than state-of-the-art baselines in experiments on MNIST and CIFAR-10. These results suggest FLARE provides robust, scalable training for resource-constrained wireless networks with heterogeneous devices.

Abstract

Wireless federated learning (WFL) suffers from heterogeneity prevailing in the data distributions, computing powers, and channel conditions of participating devices. This paper presents a new Federated Learning with Adjusted leaRning ratE (FLARE) framework to mitigate the impact of the heterogeneity. The key idea is to allow the participating devices to adjust their individual learning rates and local training iterations, adapting to their instantaneous computing powers. The convergence upper bound of FLARE is established rigorously under a general setting with non-convex models in the presence of non-i.i.d. datasets and imbalanced computing powers. By minimizing the upper bound, we further optimize the scheduling of FLARE to exploit the channel heterogeneity. A nested problem structure is revealed to facilitate iteratively allocating the bandwidth with binary search and selecting devices with a new greedy method. A linear problem structure is also identified and a low-complexity linear programming scheduling policy is designed when training models have large Lipschitz constants. Experiments demonstrate that FLARE consistently outperforms the baselines in test accuracy, and converges much faster with the proposed scheduling policy.

FLARE: A New Federated Learning Framework with Adjustable Learning Rates over Resource-Constrained Wireless Networks

TL;DR

FLARE addresses data, device, and channel heterogeneity in wireless federated learning by enabling per-device learning-rate adjustments and variable local iterations. The authors derive a convergence upper bound for non-convex, non-i.i.d. settings and design a nested scheduling framework that alternates bandwidth allocation (via binary search) and device selection (via greedy methods), with a linear-programming option when Lipschitz constants are large. The approach yields a convergence rate of under suitable parameter choices and demonstrates faster, more accurate training than state-of-the-art baselines in experiments on MNIST and CIFAR-10. These results suggest FLARE provides robust, scalable training for resource-constrained wireless networks with heterogeneous devices.

Abstract

Wireless federated learning (WFL) suffers from heterogeneity prevailing in the data distributions, computing powers, and channel conditions of participating devices. This paper presents a new Federated Learning with Adjusted leaRning ratE (FLARE) framework to mitigate the impact of the heterogeneity. The key idea is to allow the participating devices to adjust their individual learning rates and local training iterations, adapting to their instantaneous computing powers. The convergence upper bound of FLARE is established rigorously under a general setting with non-convex models in the presence of non-i.i.d. datasets and imbalanced computing powers. By minimizing the upper bound, we further optimize the scheduling of FLARE to exploit the channel heterogeneity. A nested problem structure is revealed to facilitate iteratively allocating the bandwidth with binary search and selecting devices with a new greedy method. A linear problem structure is also identified and a low-complexity linear programming scheduling policy is designed when training models have large Lipschitz constants. Experiments demonstrate that FLARE consistently outperforms the baselines in test accuracy, and converges much faster with the proposed scheduling policy.
Paper Structure (25 sections, 4 theorems, 38 equations, 8 figures, 2 algorithms)

This paper contains 25 sections, 4 theorems, 38 equations, 8 figures, 2 algorithms.

Key Result

Theorem 1

Suppose that ${\bf{w}}$ is obtained by taking $\overline{\tau}_r$ SGD steps with a learning rate $\eta_{\mathrm{l}}$, and $\widetilde{\bf{w}}$ is obtained by taking ${\tau}_r$ SGD steps with a learning rate ${\widetilde{\eta}}_{\mathrm{l}} = \eta_{\mathrm{l}}{\overline{\tau}_r}/{\tau}_r$. Starting f where $\sigma^2$ and $g$ denote the SGD variance and gradient upper bound, respectively.

Figures (8)

  • Figure 1: The architecture of a WFL system with $M_r$ selected devices at round $r$.
  • Figure 2: An illustration of model updates of a heterogeneous setting with (a) equal learning rates and (b) FLARE. The green and blue marks represent the minima of global and local objectives, respectively.
  • Figure 3: Comparison of training loss on non-i.i.d MNIST dataset with $K=60$, $M_r=20$, and uniform sampling. For the fixed aggregation, we have $\tau_{r,i}=7, \forall r \in \mathcal{M}_r$. For the flexible aggregation, every 20 devices among the $K$ devices perform 12, 6, and 3 local updates, respectively. For FLARE, $\bar{\tau}_r$ is set as the maximal local updates among the selected devices in each round.
  • Figure 4: The performance of different $\bar{\tau}_r$-strategies of FLARE on MNIST with both uniform and non-uniform sampling.
  • Figure 5: Convergence performance of different policies on MNIST.
  • ...and 3 more figures

Theorems & Definitions (10)

  • Theorem 1
  • proof
  • proof
  • Theorem 2
  • proof
  • proof
  • Theorem 3
  • proof
  • Theorem 4
  • proof