Table of Contents
Fetching ...

Analysis of Gradient Descent with Varying Step Sizes using Integral Quadratic Constraints

Ram Padmanabhan, Peter Seiler

TL;DR

Modeling the algorithm as a linear parameter-varying (LPV) system, a parameterized linear matrix inequality condition is constructed that certifies algorithm performance, which is solved using a result for polytopic LPV systems.

Abstract

The framework of Integral Quadratic Constraints (IQCs) is used to perform an analysis of gradient descent with varying step sizes. Two performance metrics are considered: convergence rate and noise amplification. We assume that the step size is produced from a line search and varies in a known interval. Modeling the algorithm as a linear, parameter-varying (LPV) system, we construct a parameterized linear matrix inequality (LMI) condition that certifies algorithm performance, which is solved using a result for polytopic LPV systems. Our results provide convergence rate guarantees when the step size lies within a restricted interval. Moreover, we recover existing rate bounds when this interval reduces to a single point, i.e. a constant step size. Finally, we note that the convergence rate depends only on the condition number of the problem. In contrast, the noise amplification performance depends on the individual values of the strong convexity and smoothness parameters, and varies inversely with them for a fixed condition number.

Analysis of Gradient Descent with Varying Step Sizes using Integral Quadratic Constraints

TL;DR

Modeling the algorithm as a linear parameter-varying (LPV) system, a parameterized linear matrix inequality condition is constructed that certifies algorithm performance, which is solved using a result for polytopic LPV systems.

Abstract

The framework of Integral Quadratic Constraints (IQCs) is used to perform an analysis of gradient descent with varying step sizes. Two performance metrics are considered: convergence rate and noise amplification. We assume that the step size is produced from a line search and varies in a known interval. Modeling the algorithm as a linear, parameter-varying (LPV) system, we construct a parameterized linear matrix inequality (LMI) condition that certifies algorithm performance, which is solved using a result for polytopic LPV systems. Our results provide convergence rate guarantees when the step size lies within a restricted interval. Moreover, we recover existing rate bounds when this interval reduces to a single point, i.e. a constant step size. Finally, we note that the convergence rate depends only on the condition number of the problem. In contrast, the noise amplification performance depends on the individual values of the strong convexity and smoothness parameters, and varies inversely with them for a fixed condition number.
Paper Structure (16 sections, 5 theorems, 43 equations, 4 figures)

This paper contains 16 sections, 5 theorems, 43 equations, 4 figures.

Key Result

Lemma 1

Let $f \in \mathcal{S}(m, L)$ and $\phi = \nabla f$. Then, $\nabla f$ satisfies the pointwise IQC defined by:

Figures (4)

  • Figure 1: $\phi$ is the nonlinear component we wish to analyze, and is replaced by the constraints it imposes on the input-output pair $(u, y)$. These are written as constraints on $z_k$.
  • Figure 2: The approximate number of iterations to convergence as a function of the condition number for gradient descent with a varying step size characterized by $c$.
  • Figure 3: The upper bound on the noise amplification metric as a function of the condition number for gradient descent with a varying step size characterized by $c$.
  • Figure 4: Tradeoff between noise amplification $\gamma$ and convergence rate $\rho$, based on \ref{['eq:tradeoff']}. A 'faster' algorithm has a larger value of metric $\gamma$, and is thus more sensitive to noise. In this figure, problem dimension $n = 1$ and strong convexity parameter $m = 1$.

Theorems & Definitions (11)

  • Definition 1: Convergence Rate
  • Definition 2: Noise Amplification
  • Definition 3
  • Definition 4
  • Lemma 1: Sector IQC, LL2016
  • Theorem 1: Problem \ref{['problem:CR']}, Convergence Rate
  • proof
  • Theorem 2: Problem \ref{['problem:NA']}, Noise Amplification
  • proof
  • Proposition 1
  • ...and 1 more