Table of Contents
Fetching ...

AdGT: Decentralized Gradient Tracking with Tuning-free Per-Agent Stepsize

Diyako Ghaderyan, Stefan Werner

TL;DR

This work tackles the challenge of stepsize tuning in decentralized optimization by introducing AdGT, a fully adaptive gradient-tracking method where each agent selects its own stepsize based on local smoothness. The algorithm maintains the standard GT structure with four updated variables per agent, and employs locally computed stepsizes with stability guarantees, achieving linear convergence to the consensus optimum. Theoretical results establish uniform bounds on the per-agent stepsizes and a contraction factor using a matrix $\Upsilon$ with $\rho(\Upsilon)<1$, under $L_i$-smooth and strongly convex local objectives on undirected graphs. Empirically, AdGT outperforms fixed-stepsize GT and centralized adaptive methods across various topologies and problem types, while requiring minimal tuning; adaptive stepsizes are especially advantageous under heterogeneity in local smoothness and network connectivity.

Abstract

In decentralized optimization, the choice of stepsize plays a critical role in algorithm performance. A common approach is to use a shared stepsize across all agents to ensure convergence. However, selecting an optimal stepsize often requires careful tuning, which can be time-consuming and may lead to slow convergence, especially when there is significant variation in the smoothness (L-smoothness) of local objective functions across agents. Individually tuning stepsizes per agent is also impractical, particularly in large-scale networks. To address these limitations, we propose AdGT, an adaptive gradient tracking method that enables each agent to adjust its stepsize based on the smoothness of its local objective. We prove that AdGT achieves linear convergence to the global optimal solution. Through numerical experiments, we compare AdGT with fixed-stepsize gradient tracking methods and demonstrate its superior performance. Additionally, we compare AdGT with adaptive gradient descent (AdGD) in a centralized setting and observe that fully adaptive stepsizes offer greater benefits in decentralized networks than in centralized ones.

AdGT: Decentralized Gradient Tracking with Tuning-free Per-Agent Stepsize

TL;DR

This work tackles the challenge of stepsize tuning in decentralized optimization by introducing AdGT, a fully adaptive gradient-tracking method where each agent selects its own stepsize based on local smoothness. The algorithm maintains the standard GT structure with four updated variables per agent, and employs locally computed stepsizes with stability guarantees, achieving linear convergence to the consensus optimum. Theoretical results establish uniform bounds on the per-agent stepsizes and a contraction factor using a matrix with , under -smooth and strongly convex local objectives on undirected graphs. Empirically, AdGT outperforms fixed-stepsize GT and centralized adaptive methods across various topologies and problem types, while requiring minimal tuning; adaptive stepsizes are especially advantageous under heterogeneity in local smoothness and network connectivity.

Abstract

In decentralized optimization, the choice of stepsize plays a critical role in algorithm performance. A common approach is to use a shared stepsize across all agents to ensure convergence. However, selecting an optimal stepsize often requires careful tuning, which can be time-consuming and may lead to slow convergence, especially when there is significant variation in the smoothness (L-smoothness) of local objective functions across agents. Individually tuning stepsizes per agent is also impractical, particularly in large-scale networks. To address these limitations, we propose AdGT, an adaptive gradient tracking method that enables each agent to adjust its stepsize based on the smoothness of its local objective. We prove that AdGT achieves linear convergence to the global optimal solution. Through numerical experiments, we compare AdGT with fixed-stepsize gradient tracking methods and demonstrate its superior performance. Additionally, we compare AdGT with adaptive gradient descent (AdGD) in a centralized setting and observe that fully adaptive stepsizes offer greater benefits in decentralized networks than in centralized ones.

Paper Structure

This paper contains 9 sections, 8 theorems, 74 equations, 3 figures, 1 algorithm.

Key Result

Lemma 1

Suppose Assumptions Assum:Lsmooth and Assum:Strongly_connected hold. Let the stepsizes $\{\alpha_i^k\}_{k\geq 0}$ be defined according to alpha_nound_L_F. Then, for every agent $i$ and all $k \geq 0$, the stepsizes are uniformly bounded as

Figures (3)

  • Figure 1: Decentralized logistic regression with $n=16$. (a) $\{r(k)\}_k$ for Star, (b) $\{r(k)\}_k$ for Cycle, (c) $\{r(k)\}_k$ for Line, (d) $\{r(k)\}_k$ for Ladder, (e) $\{r(k)\}_k$ for Random graph with connectivity ratio 0.2, (f) $\{r(k)\}_k$ for Random graph with connectivity ratio 0.35.
  • Figure 2: Decentralized quadratic problem with $n=100$.
  • Figure 3: Performance of ridge regression over cycle and random graphs.

Theorems & Definitions (24)

  • Remark 1
  • Remark 2
  • Remark 3
  • Definition 1
  • Remark 4
  • Remark 5
  • Lemma 1
  • proof
  • Remark 6
  • Lemma 2
  • ...and 14 more