Table of Contents
Fetching ...

New Results on the Polyak Stepsize: Tight Convergence Analysis and Universal Function Classes

Chang He, Wenzhi Gao, Bo Jiang, Madeleine Udell, Shuzhong Zhang

TL;DR

These findings show that the Polyak stepsize is universal, automatically adapting to various function classes without requiring prior knowledge of problem parameters, and provides new convergence guarantees for PolyakGD under both H\"older smoothness and H\"older growth conditions.

Abstract

In this paper, we revisit a classical adaptive stepsize strategy for gradient descent: the Polyak stepsize (PolyakGD), originally proposed in Polyak (1969). We study the convergence behavior of PolyakGD from two perspectives: tight worst-case analysis and universality across function classes. As our first main result, we establish the tightness of the known convergence rates of PolyakGD by explicitly constructing worst-case functions. In particular, we show that the $O((1-\frac{1}κ)^K)$ rate for smooth strongly convex functions and the $O(1/K)$ rate for smooth convex functions are both tight. Moreover, we theoretically show that PolyakGD automatically exploits floating-point errors to escape the worst-case behavior. Our second main result provides new convergence guarantees for PolyakGD under both Hölder smoothness and Hölder growth conditions. These findings show that the Polyak stepsize is universal, automatically adapting to various function classes without requiring prior knowledge of problem parameters.

New Results on the Polyak Stepsize: Tight Convergence Analysis and Universal Function Classes

TL;DR

These findings show that the Polyak stepsize is universal, automatically adapting to various function classes without requiring prior knowledge of problem parameters, and provides new convergence guarantees for PolyakGD under both H\"older smoothness and H\"older growth conditions.

Abstract

In this paper, we revisit a classical adaptive stepsize strategy for gradient descent: the Polyak stepsize (PolyakGD), originally proposed in Polyak (1969). We study the convergence behavior of PolyakGD from two perspectives: tight worst-case analysis and universality across function classes. As our first main result, we establish the tightness of the known convergence rates of PolyakGD by explicitly constructing worst-case functions. In particular, we show that the rate for smooth strongly convex functions and the rate for smooth convex functions are both tight. Moreover, we theoretically show that PolyakGD automatically exploits floating-point errors to escape the worst-case behavior. Our second main result provides new convergence guarantees for PolyakGD under both Hölder smoothness and Hölder growth conditions. These findings show that the Polyak stepsize is universal, automatically adapting to various function classes without requiring prior knowledge of problem parameters.

Paper Structure

This paper contains 29 sections, 16 theorems, 112 equations, 2 figures, 1 table, 1 algorithm.

Key Result

Proposition 3.1

Let $K \ge 1$ be the total iteration count. Consider the Huber loss defined by Then, for any initial point $x^0 \in \mathbb{R}^n$, $1$-PolyakGD converges to $x^\star = 0$ in two steps.

Figures (2)

  • Figure 1: Behavior of $\gamma$-PolyakGD on the worst-case function
  • Figure 2: Instability in the presence of floating-point error allows PolyakGD to escape the worst-case

Theorems & Definitions (29)

  • Definition 2.1
  • Definition 2.2
  • Proposition 3.1
  • Theorem 3.1
  • proof
  • Theorem 3.2
  • proof
  • Remark 3.1
  • Theorem 3.3
  • Remark 3.2
  • ...and 19 more