Adaptive Federated Learning with Auto-Tuned Clients
Junhyung Lyle Kim, Mohammad Taha Toghani, César A. Uribe, Anastasios Kyrillidis
TL;DR
This paper tackles the challenge of tuning client step sizes in Federated Learning under heterogeneous data and participation. It introduces $\Delta$-SGD, a locality-adaptive SGD where each client uses a per-iteration step size that depends on local smoothness and can increase over local steps, with extensions for FL analysis. The authors provide a nonconvex convergence bound that depends on local smoothness $\tilde{L}$ and gradient noise, along with a convex Lyapunov guarantee, and demonstrate strong empirical robustness across multiple datasets, models, and heterogeneity levels, without per-task tuning. The method is complementary to server-side approaches (FedAdam, FedProx, MOON) and remains effective under various proximal or model-contrastive formulations, offering a practical solution to client-tuning in FL with wide potential impact on distributed learning systems.
Abstract
Federated learning (FL) is a distributed machine learning framework where the global model of a central server is trained via multiple collaborative steps by participating clients without sharing their data. While being a flexible framework, where the distribution of local data, participation rate, and computing power of each client can greatly vary, such flexibility gives rise to many new challenges, especially in the hyperparameter tuning on the client side. We propose $Δ$-SGD, a simple step size rule for SGD that enables each client to use its own step size by adapting to the local smoothness of the function each client is optimizing. We provide theoretical and empirical results where the benefit of the client adaptivity is shown in various FL scenarios.
