Table of Contents
Fetching ...

Accelerated Gradient Tracking over Time-varying Graphs for Decentralized Optimization

Huan Li, Zhouchen Lin

TL;DR

This work tackles decentralized convex optimization over time-varying graphs by extending accelerated gradient tracking (Acc-GT) to non-static network topologies. The authors develop a single-loop Acc-GT with theoretical guarantees, establishing sharp convergence rates that scale with the network parameters γ and σ_γ and with the problem constants L and μ, achieving fast rates for both nonstrongly and strongly convex objectives. They additionally propose Chebyshev acceleration and a multiple-consensus subroutine to further improve network-dependence, matching lower bounds on static graphs and offering practical variants for time-varying networks. The approach is validated on a decentralized logistic regression task (CIFAR-10) where Acc-GT outperforms baselines in both objective accuracy and consensus quality, highlighting its potential for scalable federated-like distributed optimization. The results advance the understanding of single-loop accelerated methods in dynamic networks and open avenues for extensions to random or directed time-varying graphs and beyond.

Abstract

Decentralized optimization over time-varying graphs has been increasingly common in modern machine learning with massive data stored on millions of mobile devices, such as in federated learning. This paper revisits the widely used accelerated gradient tracking and extends it to time-varying graphs. We prove that the practical single loop accelerated gradient tracking needs $O((\fracγ{1-σ_γ})^2\sqrt{\frac{L}ε})$ and $O((\fracγ{1-σ_γ})^{1.5}\sqrt{\frac{L}μ}\log\frac{1}ε)$ iterations to reach an $ε$-optimal solution over time-varying graphs when the problems are nonstrongly convex and strongly convex, respectively, where $γ$ and $σ_γ$ are two common constants charactering the network connectivity, $L$ and $μ$ are the smoothness and strong convexity constants, respectively, and one iteration corresponds to one gradient oracle call and one communication round. Our convergence rates improve significantly over the ones of $O(\frac{1}{ε^{5/7}})$ and $O((\frac{L}μ)^{5/7}\frac{1}{(1-σ)^{1.5}}\log\frac{1}ε)$, respectively, which were proved in the original literature of accelerated gradient tracking only for static graphs, where $\fracγ{1-σ_γ}$ equals $\frac{1}{1-σ}$ when the network is time-invariant. When combining with a multiple consensus subroutine, the dependence on the network connectivity constants can be further improved to $O(1)$ and $O(\fracγ{1-σ_γ})$ for the gradient oracle and communication round complexities, respectively. When the network is static, by employing the Chebyshev acceleration, our complexities exactly match the lower bounds without hiding any poly-logarithmic factor for both nonstrongly convex and strongly convex problems.

Accelerated Gradient Tracking over Time-varying Graphs for Decentralized Optimization

TL;DR

This work tackles decentralized convex optimization over time-varying graphs by extending accelerated gradient tracking (Acc-GT) to non-static network topologies. The authors develop a single-loop Acc-GT with theoretical guarantees, establishing sharp convergence rates that scale with the network parameters γ and σ_γ and with the problem constants L and μ, achieving fast rates for both nonstrongly and strongly convex objectives. They additionally propose Chebyshev acceleration and a multiple-consensus subroutine to further improve network-dependence, matching lower bounds on static graphs and offering practical variants for time-varying networks. The approach is validated on a decentralized logistic regression task (CIFAR-10) where Acc-GT outperforms baselines in both objective accuracy and consensus quality, highlighting its potential for scalable federated-like distributed optimization. The results advance the understanding of single-loop accelerated methods in dynamic networks and open avenues for extensions to random or directed time-varying graphs and beyond.

Abstract

Decentralized optimization over time-varying graphs has been increasingly common in modern machine learning with massive data stored on millions of mobile devices, such as in federated learning. This paper revisits the widely used accelerated gradient tracking and extends it to time-varying graphs. We prove that the practical single loop accelerated gradient tracking needs and iterations to reach an -optimal solution over time-varying graphs when the problems are nonstrongly convex and strongly convex, respectively, where and are two common constants charactering the network connectivity, and are the smoothness and strong convexity constants, respectively, and one iteration corresponds to one gradient oracle call and one communication round. Our convergence rates improve significantly over the ones of and , respectively, which were proved in the original literature of accelerated gradient tracking only for static graphs, where equals when the network is time-invariant. When combining with a multiple consensus subroutine, the dependence on the network connectivity constants can be further improved to and for the gradient oracle and communication round complexities, respectively. When the network is static, by employing the Chebyshev acceleration, our complexities exactly match the lower bounds without hiding any poly-logarithmic factor for both nonstrongly convex and strongly convex problems.

Paper Structure

This paper contains 20 sections, 19 theorems, 124 equations, 4 figures, 2 tables, 1 algorithm.

Key Result

Theorem 1

Suppose that Assumption assumption_f holds with $\mu=0$ and Assumption assumption_w_tv holds for the sequence $\{W^k\}_{k=0}^{T\gamma}$. Let the sequence $\{\theta_k\}_{k=0}^{T\gamma}$ satisfy $\frac{1-\theta_k}{\theta_k^2}=\frac{1}{\theta_{k-1}^2}$ with $\theta_0=1$, let $\alpha\leq\frac{(1-\sigma_ and where $C=\|\overline z^0-x^*\|^2+\frac{\alpha(1-\sigma_{\gamma})}{10mL\gamma}\max_{r=0,...,\ga

Figures (4)

  • Figure 1: Comparisons of the objective function errors (left) and consensus errors (right) with respect to the number of communication (top) and computation (bottom) rounds for strongly convex problem with $d=2$.
  • Figure 2: Comparisons of the objective function errors (left) and consensus errors (right) with respect to the number of communication (top) and computation (bottom) rounds for strongly convex problem with $d=20$.
  • Figure 3: Comparisons of the objective function errors (left) and consensus errors (right) with respect to the number of communication (top) and computation (bottom) rounds for nonstrongly convex problem with $d=2$.
  • Figure 4: Comparisons of the objective function errors (left) and consensus errors (right) with respect to the number of communication (top) and computation (bottom) rounds for nonstrongly convex problem with $d=20$.

Theorems & Definitions (46)

  • Definition 1
  • Theorem 1
  • Theorem 2
  • Remark 1
  • Remark 2
  • Remark 3
  • Remark 4
  • Remark 5
  • Theorem 3
  • Theorem 4
  • ...and 36 more