AdGT: Decentralized Gradient Tracking with Tuning-free Per-Agent Stepsize
Diyako Ghaderyan, Stefan Werner
TL;DR
This work tackles the challenge of stepsize tuning in decentralized optimization by introducing AdGT, a fully adaptive gradient-tracking method where each agent selects its own stepsize based on local smoothness. The algorithm maintains the standard GT structure with four updated variables per agent, and employs locally computed stepsizes with stability guarantees, achieving linear convergence to the consensus optimum. Theoretical results establish uniform bounds on the per-agent stepsizes and a contraction factor using a matrix $\Upsilon$ with $\rho(\Upsilon)<1$, under $L_i$-smooth and strongly convex local objectives on undirected graphs. Empirically, AdGT outperforms fixed-stepsize GT and centralized adaptive methods across various topologies and problem types, while requiring minimal tuning; adaptive stepsizes are especially advantageous under heterogeneity in local smoothness and network connectivity.
Abstract
In decentralized optimization, the choice of stepsize plays a critical role in algorithm performance. A common approach is to use a shared stepsize across all agents to ensure convergence. However, selecting an optimal stepsize often requires careful tuning, which can be time-consuming and may lead to slow convergence, especially when there is significant variation in the smoothness (L-smoothness) of local objective functions across agents. Individually tuning stepsizes per agent is also impractical, particularly in large-scale networks. To address these limitations, we propose AdGT, an adaptive gradient tracking method that enables each agent to adjust its stepsize based on the smoothness of its local objective. We prove that AdGT achieves linear convergence to the global optimal solution. Through numerical experiments, we compare AdGT with fixed-stepsize gradient tracking methods and demonstrate its superior performance. Additionally, we compare AdGT with adaptive gradient descent (AdGD) in a centralized setting and observe that fully adaptive stepsizes offer greater benefits in decentralized networks than in centralized ones.
