Table of Contents
Fetching ...

Stability and Generalization of Push-Sum Based Decentralized Optimization over Directed Graphs

Yifei Liang, Yan Sun, Xiaochun Cao, Li Shen

TL;DR

A unified uniform-stability framework for the Stochastic Gradient Push (SGP) algorithm that captures the effect of directed topology and clarifies when Push-Sum correction is necessary compared with standard decentralized SGD and quantifies how imbalance and mixing jointly shape the best attainable learning performance.

Abstract

Push-Sum-based decentralized learning enables optimization over directed communication networks, where information exchange may be asymmetric. While convergence properties of such methods are well understood, their finite-iteration stability and generalization behavior remain unclear due to structural bias induced by column-stochastic mixing and asymmetric error propagation. In this work, we develop a unified uniform-stability framework for the Stochastic Gradient Push (SGP) algorithm that captures the effect of directed topology. A key technical ingredient is an imbalance-aware consistency bound for Push-Sum, which controls consensus deviation through two quantities: the stationary distribution imbalance parameter $δ$ and the spectral gap $(1-λ)$ governing mixing speed. This decomposition enables us to disentangle statistical effects from topology-induced bias. We establish finite-iteration stability and optimization guarantees for both convex objectives and non-convex objectives satisfying the Polyak--Łojasiewicz condition. For convex problems, SGP attains excess generalization error of order $\tilde{\mathcal{O}}\!\left(\frac{1}{\sqrt{mn}}+\fracγ{δ(1-λ)}+γ\right)$ under step-size schedules, and we characterize the corresponding optimal early stopping time that minimizes this bound. For PŁ objectives, we obtain convex-like optimization and generalization rates with dominant dependence proportional to $κ\!\left(1+\frac{1}{δ(1-λ)}\right)$, revealing a multiplicative coupling between problem conditioning and directed communication topology. Our analysis clarifies when Push-Sum correction is necessary compared with standard decentralized SGD and quantifies how imbalance and mixing jointly shape the best attainable learning performance.

Stability and Generalization of Push-Sum Based Decentralized Optimization over Directed Graphs

TL;DR

A unified uniform-stability framework for the Stochastic Gradient Push (SGP) algorithm that captures the effect of directed topology and clarifies when Push-Sum correction is necessary compared with standard decentralized SGD and quantifies how imbalance and mixing jointly shape the best attainable learning performance.

Abstract

Push-Sum-based decentralized learning enables optimization over directed communication networks, where information exchange may be asymmetric. While convergence properties of such methods are well understood, their finite-iteration stability and generalization behavior remain unclear due to structural bias induced by column-stochastic mixing and asymmetric error propagation. In this work, we develop a unified uniform-stability framework for the Stochastic Gradient Push (SGP) algorithm that captures the effect of directed topology. A key technical ingredient is an imbalance-aware consistency bound for Push-Sum, which controls consensus deviation through two quantities: the stationary distribution imbalance parameter and the spectral gap governing mixing speed. This decomposition enables us to disentangle statistical effects from topology-induced bias. We establish finite-iteration stability and optimization guarantees for both convex objectives and non-convex objectives satisfying the Polyak--Łojasiewicz condition. For convex problems, SGP attains excess generalization error of order under step-size schedules, and we characterize the corresponding optimal early stopping time that minimizes this bound. For PŁ objectives, we obtain convex-like optimization and generalization rates with dominant dependence proportional to , revealing a multiplicative coupling between problem conditioning and directed communication topology. Our analysis clarifies when Push-Sum correction is necessary compared with standard decentralized SGD and quantifies how imbalance and mixing jointly shape the best attainable learning performance.
Paper Structure (49 sections, 16 theorems, 171 equations, 4 figures, 2 tables, 1 algorithm)

This paper contains 49 sections, 16 theorems, 171 equations, 4 figures, 2 tables, 1 algorithm.

Key Result

Lemma 2

There exists a non-negative matrix $\bm{P}$ compatible with $\mathcal{G}$ that is doubly stochastic (satisfying $\bm{P}\mathbf{1} = \mathbf{1}$ and $\mathbf{1}^\top \bm{P} = \mathbf{1}^\top$) if and only if graph $\mathcal{G}$ is balanced.

Figures (4)

  • Figure 1: Comparison of symmetric and asymmetric topologies.
  • Figure 2: Classification of Balanced and Unbalanced Graphs.
  • Figure 3: Impacts of generalization and optimization errors on convex objective.
  • Figure 4: Impacts of generalization and optimization errors on non-convex objective.

Theorems & Definitions (38)

  • Definition 1: Balanced Graph olfati-2004
  • Lemma 2: Property of Balanced Graph olfati-2004
  • Definition 3: Spectral Gap montenegro-2006
  • Remark 4
  • Definition 5: Topological Imbalance
  • Remark 6
  • Proposition 7
  • Remark 8
  • Lemma 9: Consistency of Push-Sum
  • Remark 10
  • ...and 28 more