Lower and upper bounds of the convergence rate of gradient methods with composite noise in gradient

Artem Vasin; Alexander Gasnikov

Lower and upper bounds of the convergence rate of gradient methods with composite noise in gradient

Artem Vasin, Alexander Gasnikov

Abstract

We introduce a detailed analysis of the convergence of first-order methods with composite noise (sum of relative and absolute) in gradient for convex and smooth function minimization. This paper illustrates instances of practical problems where the utilization of inexact oracles becomes necessary, such as biased compressors, use of floating-point arithmetic and gradient-free optimization. We propose an algorithm that optimally accumulates absolute error, with intermediate convergence depending on the relative component of the noise. Usage of restart technique, regularization transformation, and stopping criteria has been demonstrated to yield results for various function classes. Also, gradient descent adaptive to relative error parameter is provided. For relative noise, lower bounds of convergence are given, confirming the dependence of the parameter of the noise on the condition number of the problem.

Lower and upper bounds of the convergence rate of gradient methods with composite noise in gradient

Abstract

Paper Structure (26 sections, 50 theorems, 271 equations, 7 figures, 3 algorithms)

This paper contains 26 sections, 50 theorems, 271 equations, 7 figures, 3 algorithms.

Introduction
Motivation
IEEE754
Biased compressors for distributed optimization
Gradient-free methods
Relative interpretation of absolute noise
Related Work
Inexact $(\delta, L, \mu)$ oracle
Relative error
Contribution
Gradient descent
Accelerated method (RE-AGM)
Regularization
Adaptiveness
Relative interpretation of absolute noise
...and 11 more sections

Key Result

Proposition 2.1

Let $A$ is symmetric positive definite matrix, $b, x$ - vectors of dimension $n$, whose elements are floating point numbers def:floating. Let $\widetilde{\nabla} f(x)$ - gradient estimation of function $f(x) = \frac{1}{2} x^T A x + b^T x, \; \nabla f(x) = Ax + b$, calculated using floating point ope If all elements of $A, b, x$ is nonnegative, then $\widetilde{\nabla} f(x)$ satisfies relative nois

Figures (7)

Figure 1: The performance of GD with composite noise and $\mu = 0, L = 100$ for different values of $\alpha$ and $\delta = 1$.
Figure 2: The performance of GD with composite noise and $\mu = 0, L = 100$ for different values of $\delta$ and $\alpha = 0.5$.
Figure 3: The performance of GD with composite noise and $\mu = 1, L = 100$ for different values of $\alpha$ and $\delta = 0.1$.
Figure 4: The performance of GD with composite noise and $\mu = 1, L = 100$ for different values of $\delta$ and $\alpha = 0.5$.
Figure 5: The performance of RE-AGM with composite noise and $\mu = 0.01, L = 100$ for different values of $\alpha$ and $\delta = 100$.
...and 2 more figures

Theorems & Definitions (88)

Proposition 2.1
Proposition 2.2
Proposition 2.3
Proposition 2.4
Theorem 5.1
Theorem 5.2
Theorem 6.1
Remark 6.1
Remark 6.2
Theorem 7.1
...and 78 more

Lower and upper bounds of the convergence rate of gradient methods with composite noise in gradient

Abstract

Lower and upper bounds of the convergence rate of gradient methods with composite noise in gradient

Authors

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (88)