Table of Contents
Fetching ...

Lower and upper bounds of the convergence rate of gradient methods with composite noise in gradient

Artem Vasin, Alexander Gasnikov

Abstract

We introduce a detailed analysis of the convergence of first-order methods with composite noise (sum of relative and absolute) in gradient for convex and smooth function minimization. This paper illustrates instances of practical problems where the utilization of inexact oracles becomes necessary, such as biased compressors, use of floating-point arithmetic and gradient-free optimization. We propose an algorithm that optimally accumulates absolute error, with intermediate convergence depending on the relative component of the noise. Usage of restart technique, regularization transformation, and stopping criteria has been demonstrated to yield results for various function classes. Also, gradient descent adaptive to relative error parameter is provided. For relative noise, lower bounds of convergence are given, confirming the dependence of the parameter of the noise on the condition number of the problem.

Lower and upper bounds of the convergence rate of gradient methods with composite noise in gradient

Abstract

We introduce a detailed analysis of the convergence of first-order methods with composite noise (sum of relative and absolute) in gradient for convex and smooth function minimization. This paper illustrates instances of practical problems where the utilization of inexact oracles becomes necessary, such as biased compressors, use of floating-point arithmetic and gradient-free optimization. We propose an algorithm that optimally accumulates absolute error, with intermediate convergence depending on the relative component of the noise. Usage of restart technique, regularization transformation, and stopping criteria has been demonstrated to yield results for various function classes. Also, gradient descent adaptive to relative error parameter is provided. For relative noise, lower bounds of convergence are given, confirming the dependence of the parameter of the noise on the condition number of the problem.
Paper Structure (26 sections, 50 theorems, 271 equations, 7 figures, 3 algorithms)

This paper contains 26 sections, 50 theorems, 271 equations, 7 figures, 3 algorithms.

Key Result

Proposition 2.1

Let $A$ is symmetric positive definite matrix, $b, x$ - vectors of dimension $n$, whose elements are floating point numbers def:floating. Let $\widetilde{\nabla} f(x)$ - gradient estimation of function $f(x) = \frac{1}{2} x^T A x + b^T x, \; \nabla f(x) = Ax + b$, calculated using floating point ope If all elements of $A, b, x$ is nonnegative, then $\widetilde{\nabla} f(x)$ satisfies relative nois

Figures (7)

  • Figure 1: The performance of GD with composite noise and $\mu = 0, L = 100$ for different values of $\alpha$ and $\delta = 1$.
  • Figure 2: The performance of GD with composite noise and $\mu = 0, L = 100$ for different values of $\delta$ and $\alpha = 0.5$.
  • Figure 3: The performance of GD with composite noise and $\mu = 1, L = 100$ for different values of $\alpha$ and $\delta = 0.1$.
  • Figure 4: The performance of GD with composite noise and $\mu = 1, L = 100$ for different values of $\delta$ and $\alpha = 0.5$.
  • Figure 5: The performance of RE-AGM with composite noise and $\mu = 0.01, L = 100$ for different values of $\alpha$ and $\delta = 100$.
  • ...and 2 more figures

Theorems & Definitions (88)

  • Proposition 2.1
  • Proposition 2.2
  • Proposition 2.3
  • Proposition 2.4
  • Theorem 5.1
  • Theorem 5.2
  • Theorem 6.1
  • Remark 6.1
  • Remark 6.2
  • Theorem 7.1
  • ...and 78 more