On the Convergence of FedProx with Extrapolation and Inexact Prox

Hanmin Li; Peter Richtárik

On the Convergence of FedProx with Extrapolation and Inexact Prox

Hanmin Li, Peter Richtárik

TL;DR

The paper analyzes FedExProx, an extrapolated variant of FedProx, under inexact proximal updates in the smooth, globally $ extmu$-strongly convex setting. It introduces absolute and relative inexactness models, establishing convergence to a neighborhood for absolute inexactness and to the exact solution under a relative-inexactness bound, with an optimal extrapolation parameter $1/(eta L_{eta})$. By linking inexact proximal updates to biased compression and biased SGD theory, the authors derive sharper rates and quantify the slowdown through $S( ext{eps}_2)$, while also providing per-client local iteration complexities for GD and AGD to achieve the specified inexactness. Numerical experiments validate the theory, showing that relative approximation can outperform exact FedProx in some cases and that extrapolation remains beneficial even with inexact proximal updates. The work highlights practical robustness of server extrapolation and offers actionable guidance on implementing inexact proximal computations in federated settings.

Abstract

Enhancing the FedProx federated learning algorithm (Li et al., 2020) with server-side extrapolation, Li et al. (2024a) recently introduced the FedExProx method. Their theoretical analysis, however, relies on the assumption that each client computes a certain proximal operator exactly, which is impractical since this is virtually never possible to do in real settings. In this paper, we investigate the behavior of FedExProx without this exactness assumption in the smooth and globally strongly convex setting. We establish a general convergence result, showing that inexactness leads to convergence to a neighborhood of the solution. Additionally, we demonstrate that, with careful control, the adverse effects of this inexactness can be mitigated. By linking inexactness to biased compression (Beznosikov et al., 2023), we refine our analysis, highlighting robustness of extrapolation to inexact proximal updates. We also examine the local iteration complexity required by each client to achieved the required level of inexactness using various local optimizers. Our theoretical insights are validated through comprehensive numerical experiments.

On the Convergence of FedProx with Extrapolation and Inexact Prox

TL;DR

The paper analyzes FedExProx, an extrapolated variant of FedProx, under inexact proximal updates in the smooth, globally

-strongly convex setting. It introduces absolute and relative inexactness models, establishing convergence to a neighborhood for absolute inexactness and to the exact solution under a relative-inexactness bound, with an optimal extrapolation parameter

. By linking inexact proximal updates to biased compression and biased SGD theory, the authors derive sharper rates and quantify the slowdown through

, while also providing per-client local iteration complexities for GD and AGD to achieve the specified inexactness. Numerical experiments validate the theory, showing that relative approximation can outperform exact FedProx in some cases and that extrapolation remains beneficial even with inexact proximal updates. The work highlights practical robustness of server extrapolation and offers actionable guidance on implementing inexact proximal computations in federated settings.

Abstract

Paper Structure (46 sections, 9 theorems, 172 equations, 6 figures, 2 tables, 2 algorithms)

This paper contains 46 sections, 9 theorems, 172 equations, 6 figures, 2 tables, 2 algorithms.

Introduction
Contributions
Related work
Mathematical background
Absolute approximation in distance
Relative approximation in distance
Achieving the level of inexactness
Conclusions
Limitations
Future work
Notations
Facts and lemmas
Theory of biased SGD
Theory of biased compression
Analysis of inexact FedExProx in the client sampling setting
...and 31 more sections

Key Result

Theorem 1

Assume asp:diff (Differentiability), asp:int-pl-rgm (Interpolation Regime), asp:cvx (Individual convexity), asp:smoothness (Smoothness) and asp:stn-cvx (Global strong convexity) hold. If each client only computes a $\varepsilon_1$-absolute approximation $\Tilde{x}_{i, k+1}$ in squared distance of ${ where $\mathcal{E}_k = \gamma M^{\gamma}\left(x_k\right) - \gamma M^{\gamma}_{\inf}$. Specifically,

Figures (6)

Figure 1: Comparison of FedProx, FedExProx with exact proximal evaluations, FedExProx with $\varepsilon_1$-absolute approximation and FedExProx with $\varepsilon_2$-relative approximation. In this case, we fix $\varepsilon_1 = 0.001$, $\varepsilon_2 = 0.01$ and pick the local step size $\gamma \in \left\{1000, 100, 10, 1, 0.1. 0.01\right\}$. The $y$-axis is the squared distance to the minimizer of $f$, and the $x$-axis denotes the iterations.
Figure 2: Comparison of FedProx, FedExProx with exact proximal evaluations, FedExProx with $\varepsilon_1$-absolute approximation and FedExProx with $\varepsilon_2$-relative approximation. In this case, we fix $\varepsilon_1 = 1e-6$, $\varepsilon_2 = 0.001$ and pick the local step size $\gamma \in \left\{1000, 100, 10, 1, 0.1. 0.01\right\}$. The $y$-axis is the squared distance to the minimizer of $f$, and the $x$-axis denotes the iterations.
Figure 3: Comparison of FedExProx with $\varepsilon_1$-absolute approximation under different level of inexactness. We select $\gamma$ from the set $\left\{0.1, 1, 10\right\}$ and for each choice of $\gamma$, we select $\varepsilon_1$ from the set $\left\{0.001, 0.005, 0.01, 0.05, 0.1\right\}$. The $y$-axis denotes the squared distance to the minimizer and the $x$-axis is the number of iterations.
Figure 4: Comparison of FedExProx with $\varepsilon_2$-relative approximation under different level of inexactness. We select $\gamma$ from the set $\left\{0.01, 0.05, 0.1\right\}$ and for each choice of $\gamma$, we select $\varepsilon_2$ from the set $\left\{0.001, 0.005, 0.01, 0.05, 0.1\right\}$. The $y$-axis denotes the squared distance to the minimizer and the $x$-axis is the number of iterations.
Figure 5: Comparison of FedExProx with $\varepsilon_2$-relative approximation under different level of inexactness using gradient diversity based extrapolation. we select $\gamma$ from the set $\left\{1, 0.1, 0.01\right\}$ and for each choice of $\gamma$, we select $\varepsilon_2$ from the set $\left\{0.0001, 0.05\right\}$. The $y$-axis denotes the squared distance to the minimizer and the $x$-axis is the number of iterations.
...and 1 more figures

Theorems & Definitions (16)

Definition 1: Proximal operator
Definition 2: Moreau envelope
Definition 3: Absolute approximation
Theorem 1
Definition 4: Relative approximation
Theorem 2
Theorem 3
Theorem 4: Local computation via GD
Theorem 5: Local computation via AGD
Lemma 1: PL-inequality
...and 6 more

On the Convergence of FedProx with Extrapolation and Inexact Prox

TL;DR

Abstract

On the Convergence of FedProx with Extrapolation and Inexact Prox

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (16)