Table of Contents
Fetching ...

Methods with Local Steps and Random Reshuffling for Generally Smooth Non-Convex Federated Optimization

Yury Demidovich, Petr Ostroukhov, Grigory Malinovsky, Samuel Horváth, Martin Takáč, Peter Richtárik, Eduard Gorbunov

TL;DR

This work addresses Federated Learning under generalized $(L_0,L_1)$-smoothness, where standard $L$-smoothness is relaxed to better reflect neural network training dynamics. It introduces three algorithms—Clip-LocalGDJ, CLERR, and Clipped RR-CLI—that combine local updates, random reshuffling, and gradient clipping to achieve provable convergence in nonconvex and Polyak–Łojasiewicz settings, without restrictive data-homogeneity assumptions. The authors provide nonconvex and PL convergence guarantees, recover known results as special cases when $L_1=0$, and extend analysis to partial participation and RR-based client/data reshuffling. Empirical results on synthetic tasks and ResNet-18 on CIFAR-10 corroborate the theory, demonstrating robustness to data heterogeneity and RR bias, and validating the practical relevance of gradient clipping in generalized-smooth FL. Overall, the paper broadens FL theory to more realistic smoothness models and offers practitioners methods with strong performance guarantees under heterogeneity and random reshuffling.

Abstract

Non-convex Machine Learning problems typically do not adhere to the standard smoothness assumption. Based on empirical findings, Zhang et al. (2020b) proposed a more realistic generalized $(L_0, L_1)$-smoothness assumption, though it remains largely unexplored. Many existing algorithms designed for standard smooth problems need to be revised. However, in the context of Federated Learning, only a few works address this problem but rely on additional limiting assumptions. In this paper, we address this gap in the literature: we propose and analyze new methods with local steps, partial participation of clients, and Random Reshuffling without extra restrictive assumptions beyond generalized smoothness. The proposed methods are based on the proper interplay between clients' and server's stepsizes and gradient clipping. Furthermore, we perform the first analysis of these methods under the Polyak-Ł ojasiewicz condition. Our theory is consistent with the known results for standard smooth problems, and our experimental results support the theoretical insights.

Methods with Local Steps and Random Reshuffling for Generally Smooth Non-Convex Federated Optimization

TL;DR

This work addresses Federated Learning under generalized -smoothness, where standard -smoothness is relaxed to better reflect neural network training dynamics. It introduces three algorithms—Clip-LocalGDJ, CLERR, and Clipped RR-CLI—that combine local updates, random reshuffling, and gradient clipping to achieve provable convergence in nonconvex and Polyak–Łojasiewicz settings, without restrictive data-homogeneity assumptions. The authors provide nonconvex and PL convergence guarantees, recover known results as special cases when , and extend analysis to partial participation and RR-based client/data reshuffling. Empirical results on synthetic tasks and ResNet-18 on CIFAR-10 corroborate the theory, demonstrating robustness to data heterogeneity and RR bias, and validating the practical relevance of gradient clipping in generalized-smooth FL. Overall, the paper broadens FL theory to more realistic smoothness models and offers practitioners methods with strong performance guarantees under heterogeneity and random reshuffling.

Abstract

Non-convex Machine Learning problems typically do not adhere to the standard smoothness assumption. Based on empirical findings, Zhang et al. (2020b) proposed a more realistic generalized -smoothness assumption, though it remains largely unexplored. Many existing algorithms designed for standard smooth problems need to be revised. However, in the context of Federated Learning, only a few works address this problem but rely on additional limiting assumptions. In this paper, we address this gap in the literature: we propose and analyze new methods with local steps, partial participation of clients, and Random Reshuffling without extra restrictive assumptions beyond generalized smoothness. The proposed methods are based on the proper interplay between clients' and server's stepsizes and gradient clipping. Furthermore, we perform the first analysis of these methods under the Polyak-Ł ojasiewicz condition. Our theory is consistent with the known results for standard smooth problems, and our experimental results support the theoretical insights.

Paper Structure

This paper contains 47 sections, 29 theorems, 286 equations, 10 figures, 7 tables, 3 algorithms.

Key Result

Theorem 1

Let Assumptions assn:f_lower_bounded and assn:asym_generalized_smoothness hold. Choose any $P\geq 1.$ Choose small local stepsizes $\alpha_p,$ server stepsizes $\gamma_p$ so that $\frac{\zeta}{\hat{a}_p} \leq \gamma_p \leq \frac{1}{4\hat{a}_p}.$ Then, the iterates $\left \{ \hat{x}_{t_p} \right \} _

Figures (10)

  • Figure 1: Function residual for \ref{['eq:exp:fourth order']}, $\alpha_t = 10^{-7}$.
  • Figure 2: Loss, gradient norm and accuracy on test dataset for ResNet-18 on Cifar-10, $\alpha_t = 0.01$
  • Figure 3: Function residual for \ref{['eq:exp:fourth order']}, starting from different $x_0$ for different number of local steps on the client device $\tau$.
  • Figure 4: Function residual for \ref{['eq:exp:fourth order']}, starting from $x_0 = (1, ..., 1)$ with batch size 16.
  • Figure 5: Function residual for \ref{['eq:exp:fourth order']}, $\alpha_t = 10^{-7}$. The best parameters are provided in the legend.
  • ...and 5 more figures

Theorems & Definitions (57)

  • Theorem 1
  • Corollary 1
  • Theorem 2
  • Corollary 2
  • Theorem 3
  • Corollary 3
  • Theorem 4
  • Corollary 4
  • Lemma 1
  • proof : Proof of Lemma \ref{['lemma:asym_smoothness']}
  • ...and 47 more