Optimized Gradient Tracking for Decentralized Online Learning

Shivangi Dubey Sharma; Ketan Rajawat

Optimized Gradient Tracking for Decentralized Online Learning

Shivangi Dubey Sharma, Ketan Rajawat

TL;DR

The paper tackles decentralized online learning where $n$ nodes collaboratively track the time-varying minimizer of $f^k(x)=\frac{1}{n}\sum_{i=1}^n f_i^k(x)$. It introduces Generalized Gradient Tracking (GGT), a unified framework that blends consensus and gradient-tracking updates, and provides a novel SDP-based analysis yielding dynamic regret bounds without assuming gradient boundedness. A condensed, four-parameter version, oGGT, is developed and can be tuned offline to minimize regret, resulting in improved theoretical guarantees and empirical performance over state-of-the-art decentralized online algorithms. The work demonstrates the method’s effectiveness via synthetic target-tracking and real-room occupancy experiments, establishing oGGT as a practical and near-optimal template for dynamic decentralized learning with robust regret guarantees.

Abstract

This work considers the problem of decentralized online learning, where the goal is to track the optimum of the sum of time-varying functions, distributed across several nodes in a network. The local availability of the functions and their gradients necessitates coordination and consensus among the nodes. We put forth the Generalized Gradient Tracking (GGT) framework that unifies a number of existing approaches, including the state-of-the-art ones. The performance of the proposed GGT algorithm is theoretically analyzed using a novel semidefinite programming-based analysis that yields the desired regret bounds under very general conditions and without requiring the gradient boundedness assumption. The results are applicable to the special cases of GGT, which include various state-of-the-art algorithms as well as new dynamic versions of various classical decentralized algorithms. To further minimize the regret, we consider a condensed version of GGT with only four free parameters. A procedure for offline tuning of these parameters using only the problem parameters is also detailed. The resulting optimized GGT (oGGT) algorithm not only achieves improved dynamic regret bounds, but also outperforms all state-of-the-art algorithms on both synthetic and real-world datasets.

Optimized Gradient Tracking for Decentralized Online Learning

TL;DR

The paper tackles decentralized online learning where

nodes collaboratively track the time-varying minimizer of

. It introduces Generalized Gradient Tracking (GGT), a unified framework that blends consensus and gradient-tracking updates, and provides a novel SDP-based analysis yielding dynamic regret bounds without assuming gradient boundedness. A condensed, four-parameter version, oGGT, is developed and can be tuned offline to minimize regret, resulting in improved theoretical guarantees and empirical performance over state-of-the-art decentralized online algorithms. The work demonstrates the method’s effectiveness via synthetic target-tracking and real-room occupancy experiments, establishing oGGT as a practical and near-optimal template for dynamic decentralized learning with robust regret guarantees.

Abstract

Paper Structure (22 sections, 11 theorems, 83 equations, 6 figures, 3 tables, 3 algorithms)

This paper contains 22 sections, 11 theorems, 83 equations, 6 figures, 3 tables, 3 algorithms.

Introduction
Problem Formulation
Review of decentralized optimization algorithms
Review of decentralized online learning algorithms
Unified Algorithm
Regret Rate Analysis
Assumptions
Compact Form
Preliminary Results
Regret Analysis
Optimized GGT
Numerical Analysis
Tracking Time Varying Target via Least-Squares
Real Data : Online learning
Conclusion
...and 7 more sections

Key Result

Lemma 1

If a function $f_i^k$ satisfies Assumption a1 such that $L_i{\mathbf{I}}_d - {\mathbf{M}}_i$ is invertible, then it holds that Where ${\mathbf{u}} = \nabla f_i^k({\mathbf{x}})$, ${\mathbf{v}} = \nabla f_i^k({\mathbf{y}})$, and ${\mathbf{L}}_{{\mathbf{M}}_i} = L_i {\mathbf{I}}_d - {\mathbf{M}}_i$, for all ${\mathbf{x}}$, ${\mathbf{y}} \in {\mathbb{R}}^d$.

Figures (6)

Figure 1: Block diagram for hyperparameter tuning of GGT
Figure 2: Convergence rate parameter ($\rho$) of various algorithms against the step-size parameter ($\eta$). The dashed line for oGGT corresponds to the optimal $\rho$ obtained after tuning all its hyperparameters.
Figure 3: Regret rates of the GGT variants for target tracking
Figure 4: Regret rates of state-of-the-art algorithms and oGGT for target tracking
Figure 5: Regret rates of the GGT variants for room occupancy prediction
...and 1 more figures

Theorems & Definitions (12)

Lemma 1
Theorem 1
Corollary 1
Corollary 2
Lemma 2
Lemma 3
Theorem 2
proof
Corollary 3
Theorem 3
...and 2 more

Optimized Gradient Tracking for Decentralized Online Learning

TL;DR

Abstract

Optimized Gradient Tracking for Decentralized Online Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (12)