FLARE: A New Federated Learning Framework with Adjustable Learning Rates over Resource-Constrained Wireless Networks

Bingnan Xiao; Jingjing Zhang; Wei Ni; Xin Wang

FLARE: A New Federated Learning Framework with Adjustable Learning Rates over Resource-Constrained Wireless Networks

Bingnan Xiao, Jingjing Zhang, Wei Ni, Xin Wang

TL;DR

FLARE addresses data, device, and channel heterogeneity in wireless federated learning by enabling per-device learning-rate adjustments and variable local iterations. The authors derive a convergence upper bound for non-convex, non-i.i.d. settings and design a nested scheduling framework that alternates bandwidth allocation (via binary search) and device selection (via greedy methods), with a linear-programming option when Lipschitz constants are large. The approach yields a convergence rate of $O\left(\frac{1}{\sqrt{\bar{\tau} M R}}\right)$ under suitable parameter choices and demonstrates faster, more accurate training than state-of-the-art baselines in experiments on MNIST and CIFAR-10. These results suggest FLARE provides robust, scalable training for resource-constrained wireless networks with heterogeneous devices.

Abstract

Wireless federated learning (WFL) suffers from heterogeneity prevailing in the data distributions, computing powers, and channel conditions of participating devices. This paper presents a new Federated Learning with Adjusted leaRning ratE (FLARE) framework to mitigate the impact of the heterogeneity. The key idea is to allow the participating devices to adjust their individual learning rates and local training iterations, adapting to their instantaneous computing powers. The convergence upper bound of FLARE is established rigorously under a general setting with non-convex models in the presence of non-i.i.d. datasets and imbalanced computing powers. By minimizing the upper bound, we further optimize the scheduling of FLARE to exploit the channel heterogeneity. A nested problem structure is revealed to facilitate iteratively allocating the bandwidth with binary search and selecting devices with a new greedy method. A linear problem structure is also identified and a low-complexity linear programming scheduling policy is designed when training models have large Lipschitz constants. Experiments demonstrate that FLARE consistently outperforms the baselines in test accuracy, and converges much faster with the proposed scheduling policy.

FLARE: A New Federated Learning Framework with Adjustable Learning Rates over Resource-Constrained Wireless Networks

TL;DR

under suitable parameter choices and demonstrates faster, more accurate training than state-of-the-art baselines in experiments on MNIST and CIFAR-10. These results suggest FLARE provides robust, scalable training for resource-constrained wireless networks with heterogeneous devices.

Abstract

Paper Structure (25 sections, 4 theorems, 38 equations, 8 figures, 2 algorithms)

This paper contains 25 sections, 4 theorems, 38 equations, 8 figures, 2 algorithms.

Introduction
Related Work
Contribution and Organization
System Model
Federated Learning Model
Local Computation-Communication Latency Model
Local Computation
Wireless Transmission
Problem Formulation of federated learning with dynamically adjusted learning rates
Overview of FLARE
Approximation Error Analysis
WFL Problem Formulation
Convergence Analysis
Proposed Scheduling under FLARE
Problem Reformulation
...and 10 more sections

Key Result

Theorem 1

Suppose that ${\bf{w}}$ is obtained by taking $\overline{\tau}_r$ SGD steps with a learning rate $\eta_{\mathrm{l}}$, and $\widetilde{\bf{w}}$ is obtained by taking ${\tau}_r$ SGD steps with a learning rate ${\widetilde{\eta}}_{\mathrm{l}} = \eta_{\mathrm{l}}{\overline{\tau}_r}/{\tau}_r$. Starting f where $\sigma^2$ and $g$ denote the SGD variance and gradient upper bound, respectively.

Figures (8)

Figure 1: The architecture of a WFL system with $M_r$ selected devices at round $r$.
Figure 2: An illustration of model updates of a heterogeneous setting with (a) equal learning rates and (b) FLARE. The green and blue marks represent the minima of global and local objectives, respectively.
Figure 3: Comparison of training loss on non-i.i.d MNIST dataset with $K=60$, $M_r=20$, and uniform sampling. For the fixed aggregation, we have $\tau_{r,i}=7, \forall r \in \mathcal{M}_r$. For the flexible aggregation, every 20 devices among the $K$ devices perform 12, 6, and 3 local updates, respectively. For FLARE, $\bar{\tau}_r$ is set as the maximal local updates among the selected devices in each round.
Figure 4: The performance of different $\bar{\tau}_r$-strategies of FLARE on MNIST with both uniform and non-uniform sampling.
Figure 5: Convergence performance of different policies on MNIST.
...and 3 more figures

Theorems & Definitions (10)

Theorem 1
proof
proof
Theorem 2
proof
proof
Theorem 3
proof
Theorem 4
proof

FLARE: A New Federated Learning Framework with Adjustable Learning Rates over Resource-Constrained Wireless Networks

TL;DR

Abstract

FLARE: A New Federated Learning Framework with Adjustable Learning Rates over Resource-Constrained Wireless Networks

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (8)

Theorems & Definitions (10)