Table of Contents
Fetching ...

FedCanon: Non-Convex Composite Federated Learning with Efficient Proximal Operation on Heterogeneous Data

Yuan Zhou, Jiachen Zhong, Xinli Shi, Guanghui Wen, Xinghuo Yu

TL;DR

FedCanon tackles composite federated learning with non-convex losses and weakly convex regularizers by decoupling proximal operations from local updates and introducing control variables to curb client drift due to data heterogeneity. The server performs a single proximal evaluation per round, reducing proximal computation on clients, and a variant FedCanon II shifts proximal steps to the clients to further cut communication costs. The authors prove convergence guarantees: sublinear proximal-gradient convergence without assuming bounded heterogeneity, and linear convergence under the PL condition, plus empirical results showing superior accuracy and efficiency on heterogeneous data. This approach offers a simple, robust, and scalable solution for non-convex composite FL.

Abstract

Composite federated learning offers a general framework for solving machine learning problems with additional regularization terms. However, existing methods often face significant limitations: many require clients to perform computationally expensive proximal operations, and their performance is frequently vulnerable to data heterogeneity. To overcome these challenges, we propose a novel composite federated learning algorithm called \textbf{FedCanon}, designed to solve the optimization problems comprising a possibly non-convex loss function and a weakly convex, potentially non-smooth regularization term. By decoupling proximal mappings from local updates, FedCanon requires only a single proximal evaluation on the server per iteration, thereby reducing the overall proximal computation cost. Concurrently, it integrates control variables into local updates to mitigate the client drift arising from data heterogeneity. The entire architecture avoids the complex subproblems of primal-dual alternatives. The theoretical analysis provides the first rigorous convergence guarantees for this proximal-skipping framework in the general non-convex setting. It establishes that FedCanon achieves a sublinear convergence rate, and a linear rate under the Polyak-Łojasiewicz condition, without the restrictive bounded heterogeneity assumption. Extensive experiments demonstrate that FedCanon outperforms the state-of-the-art methods in terms of both accuracy and computational efficiency, particularly under heterogeneous data distributions.

FedCanon: Non-Convex Composite Federated Learning with Efficient Proximal Operation on Heterogeneous Data

TL;DR

FedCanon tackles composite federated learning with non-convex losses and weakly convex regularizers by decoupling proximal operations from local updates and introducing control variables to curb client drift due to data heterogeneity. The server performs a single proximal evaluation per round, reducing proximal computation on clients, and a variant FedCanon II shifts proximal steps to the clients to further cut communication costs. The authors prove convergence guarantees: sublinear proximal-gradient convergence without assuming bounded heterogeneity, and linear convergence under the PL condition, plus empirical results showing superior accuracy and efficiency on heterogeneous data. This approach offers a simple, robust, and scalable solution for non-convex composite FL.

Abstract

Composite federated learning offers a general framework for solving machine learning problems with additional regularization terms. However, existing methods often face significant limitations: many require clients to perform computationally expensive proximal operations, and their performance is frequently vulnerable to data heterogeneity. To overcome these challenges, we propose a novel composite federated learning algorithm called \textbf{FedCanon}, designed to solve the optimization problems comprising a possibly non-convex loss function and a weakly convex, potentially non-smooth regularization term. By decoupling proximal mappings from local updates, FedCanon requires only a single proximal evaluation on the server per iteration, thereby reducing the overall proximal computation cost. Concurrently, it integrates control variables into local updates to mitigate the client drift arising from data heterogeneity. The entire architecture avoids the complex subproblems of primal-dual alternatives. The theoretical analysis provides the first rigorous convergence guarantees for this proximal-skipping framework in the general non-convex setting. It establishes that FedCanon achieves a sublinear convergence rate, and a linear rate under the Polyak-Łojasiewicz condition, without the restrictive bounded heterogeneity assumption. Extensive experiments demonstrate that FedCanon outperforms the state-of-the-art methods in terms of both accuracy and computational efficiency, particularly under heterogeneous data distributions.

Paper Structure

This paper contains 25 sections, 8 theorems, 55 equations, 3 figures, 6 tables, 2 algorithms.

Key Result

Lemma 1

Suppose that Assumptions assum:1-assum:5 hold, if $0<\alpha<\frac{1}{\rho}$ and $\beta^2\le \frac{1}{24K(K-1)L^2}$, it has where $\delta=1/(1-\alpha\rho)^2$.

Figures (3)

  • Figure 1: Training loss and proximal gradient variations of FedCanon under different global and local step sizes $\alpha$ and $\beta$: (a) and (b) use the same $\beta = 0.02$ while varying $\alpha$; (c) and (d) use the same $\alpha = 0.05$ while varying $\beta$.
  • Figure 2: Performance of FedCanon, FedAvg, SCAFFOLD and SCAFFNEW under different levels of data heterogeneity.
  • Figure 3: Test accuracy variations over training time (seconds) for FedCanon, FedMid, FedDA and ZA1, under different levels of data heterogeneity.

Theorems & Definitions (12)

  • Lemma 1
  • Lemma 2
  • Remark 1
  • Theorem 1
  • Remark 2
  • Remark 3
  • Theorem 2
  • Remark 4
  • Lemma 3
  • Lemma 4
  • ...and 2 more