Gradient Descent is Pareto-Optimal in the Oracle Complexity and Memory Tradeoff for Feasibility Problems
Moise Blanchard
TL;DR
The paper establishes oracle-complexity lower bounds for solving the feasibility problem under memory constraints, showing that gradient descent with linear memory is Pareto-optimal in the memory-oracle tradeoff up to polylog factors. By constructing a multi-layer hard class of feasibility instances and introducing adaptive depth- and probing-subspace games, the authors derive exponential-in-depth lower bounds that separate memory usage from oracle complexity. They reveal a sharp phase transition: with subquadratic memory in dimension $d$, deterministic algorithms pay a polynomial-in-$1/\epsilon$ cost in oracle queries, while quadratic memory allows cutting-plane methods to achieve $O(d\ln(1/\epsilon))$ queries; similar, though weaker, phase behavior appears for randomized algorithms. The results unify and extend prior bounds, clarifying fundamental limits of memory-constrained optimization in the feasibility setting and suggesting that practical methods like gradient descent are, in a precise sense, optimal within the studied tradeoff framework. The techniques—layered hard constructions, adaptive feasibility procedures, and orthogonal-subspace analyses—may inform future advances for convex optimization and related first-order methods under memory constraints.
Abstract
In this paper we provide oracle complexity lower bounds for finding a point in a given set using a memory-constrained algorithm that has access to a separation oracle. We assume that the set is contained within the unit $d$-dimensional ball and contains a ball of known radius $ε>0$. This setup is commonly referred to as the feasibility problem. We show that to solve feasibility problems with accuracy $ε\geq e^{-d^{o(1)}}$, any deterministic algorithm either uses $d^{1+δ}$ bits of memory or must make at least $1/(d^{0.01δ}ε^{2\frac{1-δ}{1+1.01 δ}-o(1)})$ oracle queries, for any $δ\in[0,1]$. Additionally, we show that randomized algorithms either use $d^{1+δ}$ memory or make at least $1/(d^{2δ} ε^{2(1-4δ)-o(1)})$ queries for any $δ\in[0,\frac{1}{4}]$. Because gradient descent only uses linear memory $\mathcal O(d\ln 1/ε)$ but makes $Ω(1/ε^2)$ queries, our results imply that it is Pareto-optimal in the oracle complexity/memory tradeoff. Further, our results show that the oracle complexity for deterministic algorithms is always polynomial in $1/ε$ if the algorithm has less than quadratic memory in $d$. This reveals a sharp phase transition since with quadratic $\mathcal O(d^2 \ln1/ε)$ memory, cutting plane methods only require $\mathcal O(d\ln 1/ε)$ queries.
