On the Convergence of Decentralized Stochastic Gradient-Tracking with Finite-Time Consensus
Aaron Fainman, Stefan Vlaski
TL;DR
This paper tackles decentralized optimization with gradient-tracking when finite-time consensus (FTC) sequences are only approximately realized. By developing a two-timescale analysis for a periodic Aug-DGM algorithm, it derives MSD bounds that account for the FTC approximation error $\epsilon_\tau$, sequence length $\tau$, gradient noise, and network heterogeneity. The analysis reveals a two-timescale dynamic: centroid error decays per iteration while consensus error drifts across FTC sequences, leading to a coupled bound with an amortized convergence rate of $1-\Theta(\nu\mu)+\Theta(\nu\epsilon_\tau\mu)$ and a bound that scales with $\tau^2$ and $\epsilon_\tau$ via $\gamma$. Simulations on path and hypercube graphs show graceful degradation with $\epsilon_\tau$ and a nuanced tradeoff between $\tau$ and $\epsilon_\tau$, demonstrating that approximate FTC can still yield substantial gains in sparse networks where exact FTC is impractical.
Abstract
Algorithms for decentralized optimization and learning rely on local optimization steps coupled with combination steps over a graph. Recent works have demonstrated that using a time-varying sequence of matrices that achieves finite-time consensus can improve the communication and iteration complexity of decentralized optimization algorithms based on gradient tracking. In practice, a sequence of matrices satisfying the exact finite-time consensus property may not be available due to imperfect knowledge of the network topology, a limit on the length of the sequence, or numerical instabilities. In this work, we quantify the impact of approximate finite-time consensus sequences on the convergence of a gradient-tracking based decentralized optimization algorithm. Our results hold for any periodic sequence of combination matrices. We clarify the interplay between approximation error of the finite-time consensus sequence and the length of the sequence as well as typical problem parameters such as smoothness and gradient noise.
