Table of Contents
Fetching ...

On the Convergence of Decentralized Stochastic Gradient-Tracking with Finite-Time Consensus

Aaron Fainman, Stefan Vlaski

TL;DR

This paper tackles decentralized optimization with gradient-tracking when finite-time consensus (FTC) sequences are only approximately realized. By developing a two-timescale analysis for a periodic Aug-DGM algorithm, it derives MSD bounds that account for the FTC approximation error $\epsilon_\tau$, sequence length $\tau$, gradient noise, and network heterogeneity. The analysis reveals a two-timescale dynamic: centroid error decays per iteration while consensus error drifts across FTC sequences, leading to a coupled bound with an amortized convergence rate of $1-\Theta(\nu\mu)+\Theta(\nu\epsilon_\tau\mu)$ and a bound that scales with $\tau^2$ and $\epsilon_\tau$ via $\gamma$. Simulations on path and hypercube graphs show graceful degradation with $\epsilon_\tau$ and a nuanced tradeoff between $\tau$ and $\epsilon_\tau$, demonstrating that approximate FTC can still yield substantial gains in sparse networks where exact FTC is impractical.

Abstract

Algorithms for decentralized optimization and learning rely on local optimization steps coupled with combination steps over a graph. Recent works have demonstrated that using a time-varying sequence of matrices that achieves finite-time consensus can improve the communication and iteration complexity of decentralized optimization algorithms based on gradient tracking. In practice, a sequence of matrices satisfying the exact finite-time consensus property may not be available due to imperfect knowledge of the network topology, a limit on the length of the sequence, or numerical instabilities. In this work, we quantify the impact of approximate finite-time consensus sequences on the convergence of a gradient-tracking based decentralized optimization algorithm. Our results hold for any periodic sequence of combination matrices. We clarify the interplay between approximation error of the finite-time consensus sequence and the length of the sequence as well as typical problem parameters such as smoothness and gradient noise.

On the Convergence of Decentralized Stochastic Gradient-Tracking with Finite-Time Consensus

TL;DR

This paper tackles decentralized optimization with gradient-tracking when finite-time consensus (FTC) sequences are only approximately realized. By developing a two-timescale analysis for a periodic Aug-DGM algorithm, it derives MSD bounds that account for the FTC approximation error , sequence length , gradient noise, and network heterogeneity. The analysis reveals a two-timescale dynamic: centroid error decays per iteration while consensus error drifts across FTC sequences, leading to a coupled bound with an amortized convergence rate of and a bound that scales with and via . Simulations on path and hypercube graphs show graceful degradation with and a nuanced tradeoff between and , demonstrating that approximate FTC can still yield substantial gains in sparse networks where exact FTC is impractical.

Abstract

Algorithms for decentralized optimization and learning rely on local optimization steps coupled with combination steps over a graph. Recent works have demonstrated that using a time-varying sequence of matrices that achieves finite-time consensus can improve the communication and iteration complexity of decentralized optimization algorithms based on gradient tracking. In practice, a sequence of matrices satisfying the exact finite-time consensus property may not be available due to imperfect knowledge of the network topology, a limit on the length of the sequence, or numerical instabilities. In this work, we quantify the impact of approximate finite-time consensus sequences on the convergence of a gradient-tracking based decentralized optimization algorithm. Our results hold for any periodic sequence of combination matrices. We clarify the interplay between approximation error of the finite-time consensus sequence and the length of the sequence as well as typical problem parameters such as smoothness and gradient noise.

Paper Structure

This paper contains 21 sections, 6 theorems, 77 equations, 4 figures, 1 table.

Key Result

Lemma 1

Under assumption assump:comb-mat it holds that $\epsilon_\tau<1$.

Figures (4)

  • Figure 1: The two-timescale behavior of the algorithm depicted with real data. The consensus error ${ \widehat{\boldsymbol{\mathcal{X}}}}_i$ decays over sequences of length $\tau$. It is bound in Sec. \ref{['sec:consensus-err']}. The centroid error, $\widetilde{ \boldsymbol{w} }_{c,i}$, depends on the previous iterate alone and is bound in Sec. \ref{['sec:centroid-err']}. The two errors are combined into the main result in Sec. \ref{['sec:main-res']}.
  • Figure 2: MSD for a path graph with 8 agents demonstrating the increasing steady-state error with increasing $\epsilon_\tau$. The deterioration in performance is graceful, indicating that Aug-DGM is robust to inaccuracies in the FTC sequence.
  • Figure 3: MSD for graphs with the same number of agents (${K=16}$), but varying values of $\tau$ demonstrating the performance deterioration from higher values of $\tau$.
  • Figure : (a) Hypercube

Theorems & Definitions (11)

  • Lemma 1: Approximation Error Bound
  • proof
  • Lemma 2: Perturbation Bounds
  • proof
  • Lemma 3: Consensus Error
  • proof
  • Lemma 4: Centroid Error
  • proof
  • Theorem 1
  • Lemma 5
  • ...and 1 more