Table of Contents
Fetching ...

Nonconvex Federated Learning on Compact Smooth Submanifolds With Heterogeneous Data

Jiaojiao Zhang, Jiang Hu, Anthony Man-Cho So, Mikael Johansson

TL;DR

The paper addresses nonconvex federated learning constrained to compact smooth submanifolds by proposing a projection-based, drift-corrected algorithm that leverages stochastic Riemannian gradients and local updates to improve computation and communication efficiency. It proves sublinear convergence to a neighborhood of a first-order solution, with the neighborhood size influenced by gradient variance and algorithm parameters, and provides a Lyapunov-based analysis that exploits manifold curvature. Empirical results on kPCA and LRMC demonstrate lower communication and runtime costs while achieving high accuracy, outperforming existing manifold-FL methods. The work advances practical FL on nonconvex manifolds, offering scalable performance under data heterogeneity and full client participation.

Abstract

Many machine learning tasks, such as principal component analysis and low-rank matrix completion, give rise to manifold optimization problems. Although there is a large body of work studying the design and analysis of algorithms for manifold optimization in the centralized setting, there are currently very few works addressing the federated setting. In this paper, we consider nonconvex federated learning over a compact smooth submanifold in the setting of heterogeneous client data. We propose an algorithm that leverages stochastic Riemannian gradients and a manifold projection operator to improve computational efficiency, uses local updates to improve communication efficiency, and avoids client drift. Theoretically, we show that our proposed algorithm converges sub-linearly to a neighborhood of a first-order optimal solution by using a novel analysis that jointly exploits the manifold structure and properties of the loss functions. Numerical experiments demonstrate that our algorithm has significantly smaller computational and communication overhead than existing methods.

Nonconvex Federated Learning on Compact Smooth Submanifolds With Heterogeneous Data

TL;DR

The paper addresses nonconvex federated learning constrained to compact smooth submanifolds by proposing a projection-based, drift-corrected algorithm that leverages stochastic Riemannian gradients and local updates to improve computation and communication efficiency. It proves sublinear convergence to a neighborhood of a first-order solution, with the neighborhood size influenced by gradient variance and algorithm parameters, and provides a Lyapunov-based analysis that exploits manifold curvature. Empirical results on kPCA and LRMC demonstrate lower communication and runtime costs while achieving high accuracy, outperforming existing manifold-FL methods. The work advances practical FL on nonconvex manifolds, offering scalable performance under data heterogeneity and full client participation.

Abstract

Many machine learning tasks, such as principal component analysis and low-rank matrix completion, give rise to manifold optimization problems. Although there is a large body of work studying the design and analysis of algorithms for manifold optimization in the centralized setting, there are currently very few works addressing the federated setting. In this paper, we consider nonconvex federated learning over a compact smooth submanifold in the setting of heterogeneous client data. We propose an algorithm that leverages stochastic Riemannian gradients and a manifold projection operator to improve computational efficiency, uses local updates to improve communication efficiency, and avoids client drift. Theoretically, we show that our proposed algorithm converges sub-linearly to a neighborhood of a first-order optimal solution by using a novel analysis that jointly exploits the manifold structure and properties of the loss functions. Numerical experiments demonstrate that our algorithm has significantly smaller computational and communication overhead than existing methods.
Paper Structure (23 sections, 5 theorems, 54 equations, 9 figures, 1 algorithm)

This paper contains 23 sections, 5 theorems, 54 equations, 9 figures, 1 algorithm.

Key Result

Theorem 4.3

Under Assumptions asm-prox-smooth, asm-smooth, and asm-sgd, if the step sizes satisfy where $M=\max \{{\rm diam}(\mathcal{M})/{\gamma}, {2} \}$, ${\rm diam}(\mathcal{M})= \max_{x, y \in \mathcal{M}}\! \|x-y\|$, $D_f=\max_{i,l, x \in \mathcal{M}} \|\nabla f_{il}(x;\mathcal{D}_{il})\|$, and $L_{\mathcal{P}}= \max_{x \in \overline{U}_{\mathcal{M}}(\gamma)} \|D^2 \mathcal{P}_{\mathcal{M where $\Omeg

Figures (9)

  • Figure 1: kPCA problem with Mnist dataset: Comparison on $\|\mathrm{grad} f(x^r)\|$.
  • Figure 2: kPCA with Mnist dataset: The impacts of $\tau$.
  • Figure 3: kPCA with Mnist dataset: The impacts of stochastic Riemannian gradients.
  • Figure 4: LRMC: Comparison on $\|\mathrm{grad} f(x^r)\|$.
  • Figure 5: kPCA problem with Mnist dataset: Comparison on $f(x^r)-f^{\star}$.
  • ...and 4 more figures

Theorems & Definitions (11)

  • Definition 2.1: Riemannian gradient $\mathrm{grad} f(x)$
  • Definition 2.2: $\hat{\gamma}$-proximal smoothness of $\mathcal{M}$
  • Theorem 4.3
  • Lemma A.1
  • proof
  • Lemma A.2
  • proof
  • Lemma A.3
  • proof
  • Lemma A.4
  • ...and 1 more