Table of Contents
Fetching ...

Riemannian Federated Learning via Averaging Gradient Streams

Zhenwei Huang, Wen Huang, Pratik Jawanpuria, Bamdev Mishra

TL;DR

RFedAGS addresses federated learning on Riemannian manifolds under partial participation and non-IID data by introducing averaging gradient streams as the server aggregation, enabling unbiased global updates via vector transport on curved spaces. The authors prove global convergence with decaying step sizes and convergence to a neighborhood with fixed step sizes, under standard FL and Riemannian optimization assumptions plus a nontrivial probability-assignment condition whose validity is supported by frequency-based probability estimates. Empirical results on PCA (Stiefel), HSP (hyperbolic), and FMC (SPD) demonstrate that RFedAGS consistently outperforms existing Riemannian FL methods under arbitrary participation and data heterogeneity, with robust performance even when true participation probabilities are unknown. The approach broadens the applicability of federated learning to manifold-valued problems, offering a practical and theoretically sound framework for distributed optimization on curved spaces.

Abstract

Federated learning (FL) as a distributed learning paradigm has a significant advantage in addressing large-scale machine learning tasks. In the Euclidean setting, FL algorithms have been extensively studied with both theoretical and empirical success. However, there exist few works that investigate federated learning algorithms in the Riemannian setting. In particular, critical challenges such as partial participation and data heterogeneity among agents are not explored in the Riemannian federated setting. This paper presents and analyzes a Riemannian FL algorithm, called RFedAGS, based on a new efficient server aggregation -- averaging gradient streams, which can simultaneously handle partial participation and data heterogeneity. We theoretically show that the proposed RFedAGS has global convergence and sublinear convergence rate under decaying step sizes cases; and converges sublinearly/linearly to a neighborhood of a stationary point/solution under fixed step sizes cases. These analyses are based on a vital and non-trivial assumption induced by partial participation, which is shown to hold with high probability. Extensive experiments conducted on synthetic and real-world data demonstrate the good performance of RFedAGS.

Riemannian Federated Learning via Averaging Gradient Streams

TL;DR

RFedAGS addresses federated learning on Riemannian manifolds under partial participation and non-IID data by introducing averaging gradient streams as the server aggregation, enabling unbiased global updates via vector transport on curved spaces. The authors prove global convergence with decaying step sizes and convergence to a neighborhood with fixed step sizes, under standard FL and Riemannian optimization assumptions plus a nontrivial probability-assignment condition whose validity is supported by frequency-based probability estimates. Empirical results on PCA (Stiefel), HSP (hyperbolic), and FMC (SPD) demonstrate that RFedAGS consistently outperforms existing Riemannian FL methods under arbitrary participation and data heterogeneity, with robust performance even when true participation probabilities are unknown. The approach broadens the applicability of federated learning to manifold-valued problems, offering a practical and theoretically sound framework for distributed optimization on curved spaces.

Abstract

Federated learning (FL) as a distributed learning paradigm has a significant advantage in addressing large-scale machine learning tasks. In the Euclidean setting, FL algorithms have been extensively studied with both theoretical and empirical success. However, there exist few works that investigate federated learning algorithms in the Riemannian setting. In particular, critical challenges such as partial participation and data heterogeneity among agents are not explored in the Riemannian federated setting. This paper presents and analyzes a Riemannian FL algorithm, called RFedAGS, based on a new efficient server aggregation -- averaging gradient streams, which can simultaneously handle partial participation and data heterogeneity. We theoretically show that the proposed RFedAGS has global convergence and sublinear convergence rate under decaying step sizes cases; and converges sublinearly/linearly to a neighborhood of a stationary point/solution under fixed step sizes cases. These analyses are based on a vital and non-trivial assumption induced by partial participation, which is shown to hold with high probability. Extensive experiments conducted on synthetic and real-world data demonstrate the good performance of RFedAGS.
Paper Structure (45 sections, 16 theorems, 94 equations, 13 figures, 2 tables, 1 algorithm)

This paper contains 45 sections, 16 theorems, 94 equations, 13 figures, 2 tables, 1 algorithm.

Key Result

Theorem 2.1

Under Assumption sec2:ass1, let $\mathcal{S}_t$ denotes the set of agents who respond to the server at the $t$-th round of communication. Then, $\mathbb{E}\left[\sum_{j\in \mathcal{S}_t}\frac{1}{|\mathcal{S}_t|}\mathrm{grad}f_j(x)\right] = \sum_{i=1}^N \tilde{p}_i \mathrm{grad}f_i(x),$ with $\tilde{

Figures (13)

  • Figure 1: PCA: RFedAGS consistently performs better than the competing methods across both synthetic and real datasets.
  • Figure 2: HSP with WordNet dataset. Here "primate" is the test sample (true point).
  • Figure 3: FMC with PATHMNIST dataset: RFedAGS consistently performs better than RFedAvg and RFedSVRG.
  • Figure 4: Sample distributions across different agents on MNIST dataset. $x$-axis is the ID of each agents and $y$-axis is the number of local samples.
  • Figure 5: PEC with non-I.I.D. (slight) MNIST dataset: comparisons of the two aggregations patterns (\ref{['AGS-RS']}) and (\ref{['AGS-AP']}).
  • ...and 8 more figures

Theorems & Definitions (39)

  • Theorem 2.1: Proved in Appendix \ref{['Proof:th21']}
  • Remark 3.1
  • Theorem 3.1
  • Theorem 3.2
  • Remark 3.2
  • Theorem 3.3
  • Theorem 3.4
  • Remark 3.3
  • Theorem 3.5
  • Remark 3.4
  • ...and 29 more