Locally Adaptive Federated Learning
Sohom Mukherjee, Nicolas Loizou, Sebastian U. Stich
TL;DR
The paper addresses federated optimization under client heterogeneity by replacing global constant stepsizes with fully locally adaptive updates based on the stochastic Polyak stepsize (SPS). It introduces FedSPS, a fully client-side adaptive algorithm, and a decreasing-stepsize variant FedDecSPS to achieve exact convergence in non-interpolating regimes. Theoretical results show sublinear and linear convergence in convex and strongly convex settings, with linear convergence under interpolation and exact convergence achievable with decreasing stepsizes in non-interpolating cases. Empirical results demonstrate that FedSPS matches or exceeds tuned FedAvg and FedAMS in both convex and non-convex tasks, while requiring less hyperparameter tuning and offering improved generalization.
Abstract
Federated learning is a paradigm of distributed machine learning in which multiple clients coordinate with a central server to learn a model, without sharing their own training data. Standard federated optimization methods such as Federated Averaging (FedAvg) ensure balance among the clients by using the same stepsize for local updates on all clients. However, this means that all clients need to respect the global geometry of the function which could yield slow convergence. In this work, we propose locally adaptive federated learning algorithms, that leverage the local geometric information for each client function. We show that such locally adaptive methods with uncoordinated stepsizes across all clients can be particularly efficient in interpolated (overparameterized) settings, and analyze their convergence in the presence of heterogeneous data for convex and strongly convex settings. We validate our theoretical claims by performing illustrative experiments for both i.i.d. non-i.i.d. cases. Our proposed algorithms match the optimization performance of tuned FedAvg in the convex setting, outperform FedAvg as well as state-of-the-art adaptive federated algorithms like FedAMS for non-convex experiments, and come with superior generalization performance.
