Stability and Generalization of Push-Sum Based Decentralized Optimization over Directed Graphs

Yifei Liang; Yan Sun; Xiaochun Cao; Li Shen

Stability and Generalization of Push-Sum Based Decentralized Optimization over Directed Graphs

Yifei Liang, Yan Sun, Xiaochun Cao, Li Shen

TL;DR

A unified uniform-stability framework for the Stochastic Gradient Push (SGP) algorithm that captures the effect of directed topology and clarifies when Push-Sum correction is necessary compared with standard decentralized SGD and quantifies how imbalance and mixing jointly shape the best attainable learning performance.

Abstract

Push-Sum-based decentralized learning enables optimization over directed communication networks, where information exchange may be asymmetric. While convergence properties of such methods are well understood, their finite-iteration stability and generalization behavior remain unclear due to structural bias induced by column-stochastic mixing and asymmetric error propagation. In this work, we develop a unified uniform-stability framework for the Stochastic Gradient Push (SGP) algorithm that captures the effect of directed topology. A key technical ingredient is an imbalance-aware consistency bound for Push-Sum, which controls consensus deviation through two quantities: the stationary distribution imbalance parameter $δ$ and the spectral gap $(1-λ)$ governing mixing speed. This decomposition enables us to disentangle statistical effects from topology-induced bias. We establish finite-iteration stability and optimization guarantees for both convex objectives and non-convex objectives satisfying the Polyak--Łojasiewicz condition. For convex problems, SGP attains excess generalization error of order $\tilde{\mathcal{O}}\!\left(\frac{1}{\sqrt{mn}}+\fracγ{δ(1-λ)}+γ\right)$ under step-size schedules, and we characterize the corresponding optimal early stopping time that minimizes this bound. For PŁ objectives, we obtain convex-like optimization and generalization rates with dominant dependence proportional to $κ\!\left(1+\frac{1}{δ(1-λ)}\right)$, revealing a multiplicative coupling between problem conditioning and directed communication topology. Our analysis clarifies when Push-Sum correction is necessary compared with standard decentralized SGD and quantifies how imbalance and mixing jointly shape the best attainable learning performance.

Stability and Generalization of Push-Sum Based Decentralized Optimization over Directed Graphs

TL;DR

Abstract

and the spectral gap

governing mixing speed. This decomposition enables us to disentangle statistical effects from topology-induced bias. We establish finite-iteration stability and optimization guarantees for both convex objectives and non-convex objectives satisfying the Polyak--Łojasiewicz condition. For convex problems, SGP attains excess generalization error of order

under step-size schedules, and we characterize the corresponding optimal early stopping time that minimizes this bound. For PŁ objectives, we obtain convex-like optimization and generalization rates with dominant dependence proportional to

, revealing a multiplicative coupling between problem conditioning and directed communication topology. Our analysis clarifies when Push-Sum correction is necessary compared with standard decentralized SGD and quantifies how imbalance and mixing jointly shape the best attainable learning performance.

Paper Structure (49 sections, 16 theorems, 171 equations, 4 figures, 2 tables, 1 algorithm)

This paper contains 49 sections, 16 theorems, 171 equations, 4 figures, 2 tables, 1 algorithm.

Introduction
Related work
Preliminaries
Problem Formulation
Communication Topology and Structural Properties
Algorithm: Stochastic Gradient Push (SGP)
Stability and Generalization Measures
Theoretical Analysis
Basic Assumptions
Results on Convex Case
Results on Non-convex Case
Discussion on the Impact of Topology
Experiment
Logistic Regression on a9a Dataset
LeNet on CIFAR-10 Dataset
...and 34 more sections

Key Result

Lemma 2

There exists a non-negative matrix $\bm{P}$ compatible with $\mathcal{G}$ that is doubly stochastic (satisfying $\bm{P}\mathbf{1} = \mathbf{1}$ and $\mathbf{1}^\top \bm{P} = \mathbf{1}^\top$) if and only if graph $\mathcal{G}$ is balanced.

Figures (4)

Figure 1: Comparison of symmetric and asymmetric topologies.
Figure 2: Classification of Balanced and Unbalanced Graphs.
Figure 3: Impacts of generalization and optimization errors on convex objective.
Figure 4: Impacts of generalization and optimization errors on non-convex objective.

Theorems & Definitions (38)

Definition 1: Balanced Graph olfati-2004
Lemma 2: Property of Balanced Graph olfati-2004
Definition 3: Spectral Gap montenegro-2006
Remark 4
Definition 5: Topological Imbalance
Remark 6
Proposition 7
Remark 8
Lemma 9: Consistency of Push-Sum
Remark 10
...and 28 more

Stability and Generalization of Push-Sum Based Decentralized Optimization over Directed Graphs

TL;DR

Abstract

Stability and Generalization of Push-Sum Based Decentralized Optimization over Directed Graphs

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (38)