The Price of Adaptivity in Stochastic Convex Optimization

Yair Carmon; Oliver Hinder

The Price of Adaptivity in Stochastic Convex Optimization

Yair Carmon, Oliver Hinder

TL;DR

This work defines the price of adaptivity (PoA) to quantify how much adaptivity to unknown problem parameters inflates the suboptimality in non-smooth stochastic convex optimization. It proves three information-theoretic PoA lower bounds: a logarithmic bound for unknown initial distance in expectation, a double-logarithmic bound for high-probability suboptimality, and a polynomial bound when both distance and Lipschitz constant are unknown, with a concurrent lower bound for second-moment Lipschitz and related settings. The lower bounds are constructed via reductions to coin bias testing, noisy binary search, and rare-event embeddings, and they nearly match existing upper bounds from prior work, establishing a near-complete picture of the price of adaptivity. The results also yield matching minimax quantile bounds for known-parameter cases and reveal that high-probability guarantees can be significantly more robust than expectation guarantees in certain adaptive settings. Overall, the paper delineates fundamental limits for tuning-free algorithms in SCO and clarifies how noise tails, parameter uncertainty, and access models shape achievable performance.

Abstract

We prove impossibility results for adaptivity in non-smooth stochastic convex optimization. Given a set of problem parameters we wish to adapt to, we define a "price of adaptivity" (PoA) that, roughly speaking, measures the multiplicative increase in suboptimality due to uncertainty in these parameters. When the initial distance to the optimum is unknown but a gradient norm bound is known, we show that the PoA is at least logarithmic for expected suboptimality, and double-logarithmic for median suboptimality. When there is uncertainty in both distance and gradient norm, we show that the PoA must be polynomial in the level of uncertainty. Our lower bounds nearly match existing upper bounds, and establish that there is no parameter-free lunch. En route, we also establish tight upper and lower bounds for (known-parameter) high-probability stochastic convex optimization with heavy-tailed and bounded noise, respectively.

The Price of Adaptivity in Stochastic Convex Optimization

TL;DR

Abstract

Paper Structure (36 sections, 8 theorems, 112 equations, 1 table)

This paper contains 36 sections, 8 theorems, 112 equations, 1 table.

Introduction
Paper organization and suggested reading order.
Notation.
Overview of proof techniques
\ref{['thm:log-PoA-lb']}.
\ref{['thm:loglog-PoA-lb']}.
\ref{['thm:poly-PoA-lb']}.
Information-theoretic hardness via bit-counting (\ref{['app:info-bounds']}).
Related work
Online convex optimization.
Non-stochastic optimization.
Concurrent work.
Information-theoretic hardness lemmas
A general mutual information bound
Hardness of sharpening a biased Bernoulli prior
...and 21 more sections

Key Result

Lemma 1

For any $T\in \mathbb{N}$ and $\varepsilon \in [0,1]$ and any random variable $V$, let $S_1,\ldots, S_T$ be a sequence of binary random variables such that, for all $t\le T$ with probability 1 w.r.t. $S_1,\ldots, S_{t-1}$ and $V$. Then, for every randomized estimator $\hat{V}$ that depends on $V$ only through $S_1, \ldots, S_T$, we have If in addition $\varepsilon \le \frac{1}{2}$ and, for some $

Theorems & Definitions (19)

Lemma 1
proof
Lemma 2
proof
Lemma 3
proof
proof
Proposition 1a
proof
Proposition 1b
...and 9 more

The Price of Adaptivity in Stochastic Convex Optimization

TL;DR

Abstract

The Price of Adaptivity in Stochastic Convex Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Theorems & Definitions (19)