Table of Contents
Fetching ...

Gradient Descent is Pareto-Optimal in the Oracle Complexity and Memory Tradeoff for Feasibility Problems

Moise Blanchard

TL;DR

The paper establishes oracle-complexity lower bounds for solving the feasibility problem under memory constraints, showing that gradient descent with linear memory is Pareto-optimal in the memory-oracle tradeoff up to polylog factors. By constructing a multi-layer hard class of feasibility instances and introducing adaptive depth- and probing-subspace games, the authors derive exponential-in-depth lower bounds that separate memory usage from oracle complexity. They reveal a sharp phase transition: with subquadratic memory in dimension $d$, deterministic algorithms pay a polynomial-in-$1/\epsilon$ cost in oracle queries, while quadratic memory allows cutting-plane methods to achieve $O(d\ln(1/\epsilon))$ queries; similar, though weaker, phase behavior appears for randomized algorithms. The results unify and extend prior bounds, clarifying fundamental limits of memory-constrained optimization in the feasibility setting and suggesting that practical methods like gradient descent are, in a precise sense, optimal within the studied tradeoff framework. The techniques—layered hard constructions, adaptive feasibility procedures, and orthogonal-subspace analyses—may inform future advances for convex optimization and related first-order methods under memory constraints.

Abstract

In this paper we provide oracle complexity lower bounds for finding a point in a given set using a memory-constrained algorithm that has access to a separation oracle. We assume that the set is contained within the unit $d$-dimensional ball and contains a ball of known radius $ε>0$. This setup is commonly referred to as the feasibility problem. We show that to solve feasibility problems with accuracy $ε\geq e^{-d^{o(1)}}$, any deterministic algorithm either uses $d^{1+δ}$ bits of memory or must make at least $1/(d^{0.01δ}ε^{2\frac{1-δ}{1+1.01 δ}-o(1)})$ oracle queries, for any $δ\in[0,1]$. Additionally, we show that randomized algorithms either use $d^{1+δ}$ memory or make at least $1/(d^{2δ} ε^{2(1-4δ)-o(1)})$ queries for any $δ\in[0,\frac{1}{4}]$. Because gradient descent only uses linear memory $\mathcal O(d\ln 1/ε)$ but makes $Ω(1/ε^2)$ queries, our results imply that it is Pareto-optimal in the oracle complexity/memory tradeoff. Further, our results show that the oracle complexity for deterministic algorithms is always polynomial in $1/ε$ if the algorithm has less than quadratic memory in $d$. This reveals a sharp phase transition since with quadratic $\mathcal O(d^2 \ln1/ε)$ memory, cutting plane methods only require $\mathcal O(d\ln 1/ε)$ queries.

Gradient Descent is Pareto-Optimal in the Oracle Complexity and Memory Tradeoff for Feasibility Problems

TL;DR

The paper establishes oracle-complexity lower bounds for solving the feasibility problem under memory constraints, showing that gradient descent with linear memory is Pareto-optimal in the memory-oracle tradeoff up to polylog factors. By constructing a multi-layer hard class of feasibility instances and introducing adaptive depth- and probing-subspace games, the authors derive exponential-in-depth lower bounds that separate memory usage from oracle complexity. They reveal a sharp phase transition: with subquadratic memory in dimension , deterministic algorithms pay a polynomial-in- cost in oracle queries, while quadratic memory allows cutting-plane methods to achieve queries; similar, though weaker, phase behavior appears for randomized algorithms. The results unify and extend prior bounds, clarifying fundamental limits of memory-constrained optimization in the feasibility setting and suggesting that practical methods like gradient descent are, in a precise sense, optimal within the studied tradeoff framework. The techniques—layered hard constructions, adaptive feasibility procedures, and orthogonal-subspace analyses—may inform future advances for convex optimization and related first-order methods under memory constraints.

Abstract

In this paper we provide oracle complexity lower bounds for finding a point in a given set using a memory-constrained algorithm that has access to a separation oracle. We assume that the set is contained within the unit -dimensional ball and contains a ball of known radius . This setup is commonly referred to as the feasibility problem. We show that to solve feasibility problems with accuracy , any deterministic algorithm either uses bits of memory or must make at least oracle queries, for any . Additionally, we show that randomized algorithms either use memory or make at least queries for any . Because gradient descent only uses linear memory but makes queries, our results imply that it is Pareto-optimal in the oracle complexity/memory tradeoff. Further, our results show that the oracle complexity for deterministic algorithms is always polynomial in if the algorithm has less than quadratic memory in . This reveals a sharp phase transition since with quadratic memory, cutting plane methods only require queries.
Paper Structure (31 sections, 30 theorems, 223 equations, 1 figure, 18 algorithms)

This paper contains 31 sections, 30 theorems, 223 equations, 1 figure, 18 algorithms.

Key Result

Theorem 1

Fix $\alpha\in(0,1]$. Let $d$ be a sufficiently large integer (depending on $\alpha$) and $\frac{1}{\sqrt d}\geq \epsilon \geq e^{-d^{o(1)}}$. Then, for any $\delta\in[0,1]$, any deterministic algorithm solving feasibility problems up to accuracy $\epsilon$ either uses $M=d^{1+\delta}$ bits of memor

Figures (1)

  • Figure 1: Tradeoffs between available memory and oracle complexity for the feasibility problem with accuracy $\epsilon$ in dimension $d$, in the regime $\frac{1}{\sqrt d}\geq \epsilon \geq e^{-d^{o(1)}}$ (adapted from woodworth2019open). The dashed pink (resp. green) region corresponds to historical information-theoretic lower bounds (resp. upper bounds). The region 1 and 2 correspond to the lower bound tradeoffs from marsden2022efficient and chen2023memory respectively for randomized algorithms. The region 3 corresponds to the lower bound from blanchard2023quadratic for deterministic algorithms. In this work, we show that the red (resp. pink) solid region is not achievable for deterministic (resp. randomized) algorithms.

Theorems & Definitions (34)

  • Theorem 1
  • Theorem 2
  • Definition 3: Memory-constrained algorithm
  • Definition 4: Exploratory queries
  • Theorem 5
  • Theorem 6
  • Lemma 7
  • Corollary 8
  • Theorem 9
  • Definition 10: Proper periods
  • ...and 24 more