FLARE: A New Federated Learning Framework with Adjustable Learning Rates over Resource-Constrained Wireless Networks
Bingnan Xiao, Jingjing Zhang, Wei Ni, Xin Wang
TL;DR
FLARE addresses data, device, and channel heterogeneity in wireless federated learning by enabling per-device learning-rate adjustments and variable local iterations. The authors derive a convergence upper bound for non-convex, non-i.i.d. settings and design a nested scheduling framework that alternates bandwidth allocation (via binary search) and device selection (via greedy methods), with a linear-programming option when Lipschitz constants are large. The approach yields a convergence rate of $O\left(\frac{1}{\sqrt{\bar{\tau} M R}}\right)$ under suitable parameter choices and demonstrates faster, more accurate training than state-of-the-art baselines in experiments on MNIST and CIFAR-10. These results suggest FLARE provides robust, scalable training for resource-constrained wireless networks with heterogeneous devices.
Abstract
Wireless federated learning (WFL) suffers from heterogeneity prevailing in the data distributions, computing powers, and channel conditions of participating devices. This paper presents a new Federated Learning with Adjusted leaRning ratE (FLARE) framework to mitigate the impact of the heterogeneity. The key idea is to allow the participating devices to adjust their individual learning rates and local training iterations, adapting to their instantaneous computing powers. The convergence upper bound of FLARE is established rigorously under a general setting with non-convex models in the presence of non-i.i.d. datasets and imbalanced computing powers. By minimizing the upper bound, we further optimize the scheduling of FLARE to exploit the channel heterogeneity. A nested problem structure is revealed to facilitate iteratively allocating the bandwidth with binary search and selecting devices with a new greedy method. A linear problem structure is also identified and a low-complexity linear programming scheduling policy is designed when training models have large Lipschitz constants. Experiments demonstrate that FLARE consistently outperforms the baselines in test accuracy, and converges much faster with the proposed scheduling policy.
