Table of Contents
Fetching ...

Lower Complexity Bounds for Nonconvex-Strongly-Convex Bilevel Optimization with First-Order Oracles

Kaiyi Ji

TL;DR

This work develops new hard instances that yield nontrivial lower bounds under deterministic and stochastic first-order oracle models and proves that any first-order zero-respecting algorithm requires at least $\Omega(\kappa^{3/2}\epsilon^{-2})$ oracle calls to find an $\epsilon$-accurate stationary point.

Abstract

Although upper bound guarantees for bilevel optimization have been widely studied, progress on lower bounds has been limited due to the complexity of the bilevel structure. In this work, we focus on the smooth nonconvex-strongly-convex setting and develop new hard instances that yield nontrivial lower bounds under deterministic and stochastic first-order oracle models. In the deterministic case, we prove that any first-order zero-respecting algorithm requires at least $Ω(κ^{3/2}ε^{-2})$ oracle calls to find an $ε$-accurate stationary point, improving the optimal lower bounds known for single-level nonconvex optimization and for nonconvex-strongly-convex min-max problems. In the stochastic case, we show that at least $Ω(κ^{5/2}ε^{-4})$ stochastic oracle calls are necessary, again strengthening the best known bounds in related settings. Our results expose substantial gaps between current upper and lower bounds for bilevel optimization and suggest that even simplified regimes, such as those with quadratic lower-level objectives, warrant further investigation toward understanding the optimal complexity of bilevel optimization under standard first-order oracles.

Lower Complexity Bounds for Nonconvex-Strongly-Convex Bilevel Optimization with First-Order Oracles

TL;DR

This work develops new hard instances that yield nontrivial lower bounds under deterministic and stochastic first-order oracle models and proves that any first-order zero-respecting algorithm requires at least oracle calls to find an -accurate stationary point.

Abstract

Although upper bound guarantees for bilevel optimization have been widely studied, progress on lower bounds has been limited due to the complexity of the bilevel structure. In this work, we focus on the smooth nonconvex-strongly-convex setting and develop new hard instances that yield nontrivial lower bounds under deterministic and stochastic first-order oracle models. In the deterministic case, we prove that any first-order zero-respecting algorithm requires at least oracle calls to find an -accurate stationary point, improving the optimal lower bounds known for single-level nonconvex optimization and for nonconvex-strongly-convex min-max problems. In the stochastic case, we show that at least stochastic oracle calls are necessary, again strengthening the best known bounds in related settings. Our results expose substantial gaps between current upper and lower bounds for bilevel optimization and suggest that even simplified regimes, such as those with quadratic lower-level objectives, warrant further investigation toward understanding the optimal complexity of bilevel optimization under standard first-order oracles.

Paper Structure

This paper contains 25 sections, 11 theorems, 72 equations, 1 figure.

Key Result

Theorem 1

For any $L_f, L_g, \mu, \Delta, \epsilon > 0$ satisfying $\kappa = L_g / \mu \ge 1$ and $\frac{\Delta}{L_f}=\mathcal{O}(1)$, there exist functions $f : \mathbb{R}^m \times \mathbb{R}^n \to \mathbb{R}$ and $g : \mathbb{R}^m \times \mathbb{R}^n \to \mathbb{R}$ such that $\{f, g\} \in \mathcal{F}(L_f, oracle calls, where $H(\mathbf{x}) = f(\mathbf{x}; \mathbf{y}^*(\mathbf{x}))$ with $y^*(\mathbf{x})

Figures (1)

  • Figure 1: An illustration of the zero-chain for our constructed instance in \ref{['obj:construction']} for nonconvex-strongly-convex bilevel optimization.

Theorems & Definitions (20)

  • Definition 1
  • Definition 2: Deterministic first-order oracle
  • Definition 3: Stochastic first-order oracle
  • Definition 4: First-order bilevel algorithm class.
  • Definition 5: Zero-chain
  • Theorem 1
  • Lemma 2
  • Lemma 3
  • Lemma 4
  • proof : Proof of \ref{['th:deter']}
  • ...and 10 more