Table of Contents
Fetching ...

On the Complexity of Finding Small Subgradients in Nonsmooth Optimization

Guy Kornowski, Ohad Shamir

TL;DR

The paper analyzes the oracle complexity of obtaining $(\delta,\epsilon)$-stationary points for Lipschitz (possibly nonsmooth) functions under first-order access. It proves a fundamental gap: deterministic algorithms cannot achieve a dimension-free rate, while smoothness enables a derandomized $(\delta,\epsilon)$-stationarity rate with only logarithmic dependence on the smoothness parameter. It provides several lower bounds that hold for any (possibly randomized) first-order method, including $\Omega(1/\epsilon^2)$-type and $\Omega(\log(1/\delta))$-type bounds, and initiates a convex-nonsmooth analysis showing that even convex cases resist finite-time guarantees for exact $(\delta,\epsilon)$-stationarity. For convex functions, the authors obtain improved rates for a relaxation to being $\delta$-close to an $\epsilon$-stationary point, revealing a trade-off between domain bounds and relaxation strength. Overall, the work maps the landscape of complexity for small subgradients in nonsmooth optimization and opens questions about tightening the delta-epsilon landscape and dimension-dependent bounds.

Abstract

We study the oracle complexity of producing $(δ,ε)$-stationary points of Lipschitz functions, in the sense proposed by Zhang et al. [2020]. While there exist dimension-free randomized algorithms for producing such points within $\widetilde{O}(1/δε^3)$ first-order oracle calls, we show that no dimension-free rate can be achieved by a deterministic algorithm. On the other hand, we point out that this rate can be derandomized for smooth functions with merely a logarithmic dependence on the smoothness parameter. Moreover, we establish several lower bounds for this task which hold for any randomized algorithm, with or without convexity. Finally, we show how the convergence rate of finding $(δ,ε)$-stationary points can be improved in case the function is convex, a setting which we motivate by proving that in general no finite time algorithm can produce points with small subgradients even for convex functions.

On the Complexity of Finding Small Subgradients in Nonsmooth Optimization

TL;DR

The paper analyzes the oracle complexity of obtaining -stationary points for Lipschitz (possibly nonsmooth) functions under first-order access. It proves a fundamental gap: deterministic algorithms cannot achieve a dimension-free rate, while smoothness enables a derandomized -stationarity rate with only logarithmic dependence on the smoothness parameter. It provides several lower bounds that hold for any (possibly randomized) first-order method, including -type and -type bounds, and initiates a convex-nonsmooth analysis showing that even convex cases resist finite-time guarantees for exact -stationarity. For convex functions, the authors obtain improved rates for a relaxation to being -close to an -stationary point, revealing a trade-off between domain bounds and relaxation strength. Overall, the work maps the landscape of complexity for small subgradients in nonsmooth optimization and opens questions about tightening the delta-epsilon landscape and dimension-dependent bounds.

Abstract

We study the oracle complexity of producing -stationary points of Lipschitz functions, in the sense proposed by Zhang et al. [2020]. While there exist dimension-free randomized algorithms for producing such points within first-order oracle calls, we show that no dimension-free rate can be achieved by a deterministic algorithm. On the other hand, we point out that this rate can be derandomized for smooth functions with merely a logarithmic dependence on the smoothness parameter. Moreover, we establish several lower bounds for this task which hold for any randomized algorithm, with or without convexity. Finally, we show how the convergence rate of finding -stationary points can be improved in case the function is convex, a setting which we motivate by proving that in general no finite time algorithm can produce points with small subgradients even for convex functions.
Paper Structure (18 sections, 16 theorems, 49 equations, 1 figure, 3 algorithms)

This paper contains 18 sections, 16 theorems, 49 equations, 1 figure, 3 algorithms.

Key Result

Theorem 1

For any deterministic first-order algorithm and any iteration budget $T\in\mathbb{N}$, there exists a $1$-Lipschitz function $f:\mathbb{R}^d\to\mathbb{R},\,d=T+2$ such that $f(x_1)-\inf_{x}f(x)\leq 1$ yet the $T$ iterates produced by the algorithm when applied to $f$ are not $(\delta,\epsilon)$-stat

Figures (1)

  • Figure 1: Illustration of $I_0,I_1$ and $h_0,h_1$.

Theorems & Definitions (26)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • Claim 1
  • Theorem 5
  • Theorem 6
  • Lemma 1
  • proof
  • Definition 1
  • ...and 16 more