Table of Contents
Fetching ...

On graphs with finite-time consensus and their use in gradient tracking

Edward Duc Hien Nguyen, Xin Jiang, Bicheng Ying, César A. Uribe

TL;DR

This work addresses decentralized optimization of $f(x)=\frac{1}{n}\sum_{i=1}^n f_i(x)$ under deterministic sequences of graphs that satisfy finite-time consensus, enabling exact averaging after $\tau$ steps.It introduces Gradient Tracking for Finite-Time Consensus Topologies (GT-FT), which restricts gradient-tracking updates to topology sequences with finite-time averaging, and provides nonconvex convergence guarantees under stochastic gradients with a stepsize $\alpha$ in $(0,1/(4\sqrt{6}\tau^2 L)]$.The authors derive explicit weight-matrix representations for one-peer exponential graphs and for $p$-peer hyper-cuboids, show finite-time consensus for these sequences, and establish a connection to de Bruijn graphs, broadening the class of sparse, scalable topologies available for decentralized optimization.Numerical experiments demonstrate that GT-FT attains the same iteration complexity as GT with static topologies while offering substantially lower communication costs due to sparsity, illustrating practical benefits for large-scale and resource-constrained networks.

Abstract

This paper studies sequences of graphs satisfying the finite-time consensus property (i.e., iterating through such a finite sequence is equivalent to performing global or exact averaging) and their use in Gradient Tracking. We provide an explicit weight matrix representation of the studied sequences and prove their finite-time consensus property. Moreover, we incorporate the studied finite-time consensus topologies into Gradient Tracking and present a new algorithmic scheme called Gradient Tracking for Finite-Time Consensus Topologies (GT-FT). We analyze the new scheme for nonconvex problems with stochastic gradient estimates. Our analysis shows that the convergence rate of GT-FT does not depend on the heterogeneity of the agents' functions or the connectivity of any individual graph in the topology sequence. Furthermore, owing to the sparsity of the graphs, GT-FT requires lower communication costs than Gradient Tracking using the static counterpart of the topology sequence.

On graphs with finite-time consensus and their use in gradient tracking

TL;DR

This work addresses decentralized optimization of $f(x)=\frac{1}{n}\sum_{i=1}^n f_i(x)$ under deterministic sequences of graphs that satisfy finite-time consensus, enabling exact averaging after $\tau$ steps.It introduces Gradient Tracking for Finite-Time Consensus Topologies (GT-FT), which restricts gradient-tracking updates to topology sequences with finite-time averaging, and provides nonconvex convergence guarantees under stochastic gradients with a stepsize $\alpha$ in $(0,1/(4\sqrt{6}\tau^2 L)]$.The authors derive explicit weight-matrix representations for one-peer exponential graphs and for $p$-peer hyper-cuboids, show finite-time consensus for these sequences, and establish a connection to de Bruijn graphs, broadening the class of sparse, scalable topologies available for decentralized optimization.Numerical experiments demonstrate that GT-FT attains the same iteration complexity as GT with static topologies while offering substantially lower communication costs due to sparsity, illustrating practical benefits for large-scale and resource-constrained networks.

Abstract

This paper studies sequences of graphs satisfying the finite-time consensus property (i.e., iterating through such a finite sequence is equivalent to performing global or exact averaging) and their use in Gradient Tracking. We provide an explicit weight matrix representation of the studied sequences and prove their finite-time consensus property. Moreover, we incorporate the studied finite-time consensus topologies into Gradient Tracking and present a new algorithmic scheme called Gradient Tracking for Finite-Time Consensus Topologies (GT-FT). We analyze the new scheme for nonconvex problems with stochastic gradient estimates. Our analysis shows that the convergence rate of GT-FT does not depend on the heterogeneity of the agents' functions or the connectivity of any individual graph in the topology sequence. Furthermore, owing to the sparsity of the graphs, GT-FT requires lower communication costs than Gradient Tracking using the static counterpart of the topology sequence.
Paper Structure (30 sections, 10 theorems, 76 equations, 5 figures, 1 table, 1 algorithm)

This paper contains 30 sections, 10 theorems, 76 equations, 5 figures, 1 table, 1 algorithm.

Key Result

Proposition 3.1

Given $n \in \mathbb N_{\geq 2}$, let $\tau = \lceil \log_2 (n) \rceil$, and let $\{W^{(l)}\}_{l \in \mathbb N} \subset \mathbb R^{n \times n}$ be the weight matrices defined in def:exp-mat. Each matrix $W^{(l)}$ is circulant and doubly stochastic, i.e., $W^{(l)} \mathds{1} = \mathds{1}$ and $\mathd

Figures (5)

  • Figure 1: The three one-peer exponential graphs $\{\mathcal{G}^{(l)}\}_{l=0}^2$ with $n=8$ and $\tau = 3$. Note that all nodes have self-loops, though not explicitly shown in the figure.
  • Figure 2: The three $2$-peer hyper-cuboids $\{\mathcal{G}^{(l)}\}_{l=0}^2$ with $n=12$, $(p_2,p_1,p_0) = (2,2,3)$, and $\tau=3$. Note that all nodes have self-loops, though not explicitly shown here.
  • Figure 3: Consensus error versus the number of iterations. The legend is composed of three parts. The first part is either "Static", "1-P", or "$p$-P", standing for static graphs, one-peer time-varying graphs, and $p$-peer time-varying graphs, respectively. The second part of the legend describes the graph type: exponential, hyper-cube, de Bruijn, or hyper-cuboid. The third part is for the number of agents. All graphs satisfying Definition \ref{['def:fin-time-cons']} are plotted with solid lines, while others are plotted with dashed lines. The static variants do not achieve finite-time consensus. The 1-P time-varying graphs only achieve finite-time consensus when the number of agents is a power of two. The $p$-P time-varying graphs always achieve finite-time consensus. The sizes of split network components used in Base-$(k+1)$ graphs takezawa2023exponential are $(16, 8)$ for 24 nodes size, $(32, 4)$ for 36 nodes, and $(64, 8)$ for 72 nodes, respectively.
  • Figure 4: Comparison of the use of one-peer exponential graphs and static exponential graphs in decentralized algorithms. One-peer exponential graphs are used in GT-FT and DGD, and static exponential graphs are used in TV-GT and DGD.
  • Figure 5: Comparison of the use of $p$-peer hyper-cuboids and static hyper-cuboids in decentralized optimization algorithms. $p$-peer hyper-cuboids are used in GT-FT and DGD, and static hyper-cuboids are used in TV-GT and DGD.

Theorems & Definitions (17)

  • Definition 3.1
  • Proposition 3.1
  • Lemma 3.2
  • proof : Proof of \ref{['lem:exp-fin-time-cons']}
  • Proposition 3.3
  • proof
  • Proposition 3.4
  • proof
  • Proposition 3.5
  • proof
  • ...and 7 more