Gradient Descent is Pareto-Optimal in the Oracle Complexity and Memory Tradeoff for Feasibility Problems

Moise Blanchard

Gradient Descent is Pareto-Optimal in the Oracle Complexity and Memory Tradeoff for Feasibility Problems

Moise Blanchard

TL;DR

The paper establishes oracle-complexity lower bounds for solving the feasibility problem under memory constraints, showing that gradient descent with linear memory is Pareto-optimal in the memory-oracle tradeoff up to polylog factors. By constructing a multi-layer hard class of feasibility instances and introducing adaptive depth- and probing-subspace games, the authors derive exponential-in-depth lower bounds that separate memory usage from oracle complexity. They reveal a sharp phase transition: with subquadratic memory in dimension $d$, deterministic algorithms pay a polynomial-in-$1/\epsilon$ cost in oracle queries, while quadratic memory allows cutting-plane methods to achieve $O(d\ln(1/\epsilon))$ queries; similar, though weaker, phase behavior appears for randomized algorithms. The results unify and extend prior bounds, clarifying fundamental limits of memory-constrained optimization in the feasibility setting and suggesting that practical methods like gradient descent are, in a precise sense, optimal within the studied tradeoff framework. The techniques—layered hard constructions, adaptive feasibility procedures, and orthogonal-subspace analyses—may inform future advances for convex optimization and related first-order methods under memory constraints.

Abstract

In this paper we provide oracle complexity lower bounds for finding a point in a given set using a memory-constrained algorithm that has access to a separation oracle. We assume that the set is contained within the unit $d$-dimensional ball and contains a ball of known radius $ε>0$. This setup is commonly referred to as the feasibility problem. We show that to solve feasibility problems with accuracy $ε\geq e^{-d^{o(1)}}$, any deterministic algorithm either uses $d^{1+δ}$ bits of memory or must make at least $1/(d^{0.01δ}ε^{2\frac{1-δ}{1+1.01 δ}-o(1)})$ oracle queries, for any $δ\in[0,1]$. Additionally, we show that randomized algorithms either use $d^{1+δ}$ memory or make at least $1/(d^{2δ} ε^{2(1-4δ)-o(1)})$ queries for any $δ\in[0,\frac{1}{4}]$. Because gradient descent only uses linear memory $\mathcal O(d\ln 1/ε)$ but makes $Ω(1/ε^2)$ queries, our results imply that it is Pareto-optimal in the oracle complexity/memory tradeoff. Further, our results show that the oracle complexity for deterministic algorithms is always polynomial in $1/ε$ if the algorithm has less than quadratic memory in $d$. This reveals a sharp phase transition since with quadratic $\mathcal O(d^2 \ln1/ε)$ memory, cutting plane methods only require $\mathcal O(d\ln 1/ε)$ queries.

Gradient Descent is Pareto-Optimal in the Oracle Complexity and Memory Tradeoff for Feasibility Problems

TL;DR

, deterministic algorithms pay a polynomial-in-

cost in oracle queries, while quadratic memory allows cutting-plane methods to achieve

queries; similar, though weaker, phase behavior appears for randomized algorithms. The results unify and extend prior bounds, clarifying fundamental limits of memory-constrained optimization in the feasibility setting and suggesting that practical methods like gradient descent are, in a precise sense, optimal within the studied tradeoff framework. The techniques—layered hard constructions, adaptive feasibility procedures, and orthogonal-subspace analyses—may inform future advances for convex optimization and related first-order methods under memory constraints.

Abstract

-dimensional ball and contains a ball of known radius

. This setup is commonly referred to as the feasibility problem. We show that to solve feasibility problems with accuracy

, any deterministic algorithm either uses

bits of memory or must make at least

oracle queries, for any

. Additionally, we show that randomized algorithms either use

memory or make at least

queries for any

. Because gradient descent only uses linear memory

but makes

queries, our results imply that it is Pareto-optimal in the oracle complexity/memory tradeoff. Further, our results show that the oracle complexity for deterministic algorithms is always polynomial in

if the algorithm has less than quadratic memory in

. This reveals a sharp phase transition since with quadratic

memory, cutting plane methods only require

queries.

Paper Structure (31 sections, 30 theorems, 223 equations, 1 figure, 18 algorithms)

This paper contains 31 sections, 30 theorems, 223 equations, 1 figure, 18 algorithms.

Introduction
Previous results on oracle complexity/memory tradeoffs for convex optimization and feasibility problems.
Our contribution.
On the tightness of the results.
Additional works on learning with memory constraints.
Outline of the paper
Formal setup and notations
Notations.
Technical overview of the proofs
Challenges for having $\epsilon$-dependent query lower bounds
Construction of the hard class of feasibility problems
Structure of the proof for deterministic algorithms.
Properties of probing subspaces.
Query lower bounds for an Orthogonal Subspace Game.
Reduction from the feasibility procedure to the Orthogonal Subspace Game.
...and 16 more sections

Key Result

Theorem 1

Fix $\alpha\in(0,1]$. Let $d$ be a sufficiently large integer (depending on $\alpha$) and $\frac{1}{\sqrt d}\geq \epsilon \geq e^{-d^{o(1)}}$. Then, for any $\delta\in[0,1]$, any deterministic algorithm solving feasibility problems up to accuracy $\epsilon$ either uses $M=d^{1+\delta}$ bits of memor

Figures (1)

Figure 1: Tradeoffs between available memory and oracle complexity for the feasibility problem with accuracy $\epsilon$ in dimension $d$, in the regime $\frac{1}{\sqrt d}\geq \epsilon \geq e^{-d^{o(1)}}$ (adapted from woodworth2019open). The dashed pink (resp. green) region corresponds to historical information-theoretic lower bounds (resp. upper bounds). The region 1 and 2 correspond to the lower bound tradeoffs from marsden2022efficient and chen2023memory respectively for randomized algorithms. The region 3 corresponds to the lower bound from blanchard2023quadratic for deterministic algorithms. In this work, we show that the red (resp. pink) solid region is not achievable for deterministic (resp. randomized) algorithms.

Theorems & Definitions (34)

Theorem 1
Theorem 2
Definition 3: Memory-constrained algorithm
Definition 4: Exploratory queries
Theorem 5
Theorem 6
Lemma 7
Corollary 8
Theorem 9
Definition 10: Proper periods
...and 24 more

Gradient Descent is Pareto-Optimal in the Oracle Complexity and Memory Tradeoff for Feasibility Problems

TL;DR

Abstract

Gradient Descent is Pareto-Optimal in the Oracle Complexity and Memory Tradeoff for Feasibility Problems

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (34)