Table of Contents
Fetching ...

Distributed Adaptive Gradient Algorithm with Gradient Tracking for Stochastic Non-Convex Optimization

Dongyu Han, Kun Liu, Yeming Lin, Yuanqing Xia

TL;DR

An upper bound on the optimality gap is established, which indicates that the proposed algorithm can reach a first-order stationary solution dependent on the upper bound on the variance of the stochastic gradients.

Abstract

This paper considers a distributed stochastic non-convex optimization problem, where the nodes in a network cooperatively minimize a sum of $L$-smooth local cost functions with sparse gradients. By adaptively adjusting the stepsizes according to the historical (possibly sparse) gradients, a distributed adaptive gradient algorithm is proposed, in which a gradient tracking estimator is used to handle the heterogeneity between different local cost functions. We establish an upper bound on the optimality gap, which indicates that our proposed algorithm can reach a first-order stationary solution dependent on the upper bound on the variance of the stochastic gradients. Finally, numerical examples are presented to illustrate the effectiveness of the algorithm.

Distributed Adaptive Gradient Algorithm with Gradient Tracking for Stochastic Non-Convex Optimization

TL;DR

An upper bound on the optimality gap is established, which indicates that the proposed algorithm can reach a first-order stationary solution dependent on the upper bound on the variance of the stochastic gradients.

Abstract

This paper considers a distributed stochastic non-convex optimization problem, where the nodes in a network cooperatively minimize a sum of -smooth local cost functions with sparse gradients. By adaptively adjusting the stepsizes according to the historical (possibly sparse) gradients, a distributed adaptive gradient algorithm is proposed, in which a gradient tracking estimator is used to handle the heterogeneity between different local cost functions. We establish an upper bound on the optimality gap, which indicates that our proposed algorithm can reach a first-order stationary solution dependent on the upper bound on the variance of the stochastic gradients. Finally, numerical examples are presented to illustrate the effectiveness of the algorithm.
Paper Structure (8 sections, 13 theorems, 76 equations, 4 figures, 1 algorithm)

This paper contains 8 sections, 13 theorems, 76 equations, 4 figures, 1 algorithm.

Key Result

Lemma 1

Under Assumptions ass_smooth and assumption_sto_grad(b), we have the following result: with

Figures (4)

  • Figure 1: Comparison of different algorithms on distributed robust linear regression problems. Solid curves and shaded regions represent the average value and range statistics, respectively.
  • Figure 2: Comparison of different algorithms on distributed logistic regression on a9a dataset.
  • Figure 3: Comparison of different algorithms on distributed logistic regression on Covertype dataset.
  • Figure 4: Comparison of different algorithms on distributed logistic regression on MNIST dataset.

Theorems & Definitions (36)

  • Remark 1
  • Remark 2
  • Remark 3
  • Lemma 1
  • proof
  • Lemma 2
  • proof
  • Lemma 3
  • proof
  • Lemma 4
  • ...and 26 more