Table of Contents
Fetching ...

On the Complexity of First-Order Methods in Stochastic Bilevel Optimization

Jeongyeol Kwon, Dohyun Kwon, Hanbaek Lyu

TL;DR

This work analyzes the fundamental complexity of finding stationary points in stochastic bilevel optimization with a strongly convex lower level, under a $y^*(x)$-aware oracle that provides an $O(\epsilon)$-accurate lower-level solution and locally unbiased gradients. It introduces a penalty-based approach that reduces the bilevel problem to a single-level surrogate $\mathcal{L}_{\lambda}^*(x)$ and leverages inner-outer loop schemes to control bias and variance, achieving $O(\epsilon^{-6})$ complexity without stochastic smoothness and $O(\epsilon^{-4})$ with it, for $\lambda=O(\epsilon^{-1})$. The paper also derives matching lower bounds $\Omega(\epsilon^{-6})$ and $\Omega(\epsilon^{-4})$ via probabilistic zero-chains, demonstrating that any algorithm playing against a $y^*(x)$-aware oracle cannot surpass these rates under the stated assumptions. By connecting the upper and lower bounds, the results reveal that first-order methods, under mild smoothness conditions, can match certain second-order benchmarks in bilevel settings and establish tight complexity barriers for this oracle model. The findings have implications for the design of efficient bilevel optimization algorithms in meta-learning, hyperparameter tuning, and adversarial learning, clarifying when a $y^*(x)$-aware oracle can yield substantial gains and when intrinsic difficulty limits progress.

Abstract

We consider the problem of finding stationary points in Bilevel optimization when the lower-level problem is unconstrained and strongly convex. The problem has been extensively studied in recent years; the main technical challenge is to keep track of lower-level solutions $y^*(x)$ in response to the changes in the upper-level variables $x$. Subsequently, all existing approaches tie their analyses to a genie algorithm that knows lower-level solutions and, therefore, need not query any points far from them. We consider a dual question to such approaches: suppose we have an oracle, which we call $y^*$-aware, that returns an $O(ε)$-estimate of the lower-level solution, in addition to first-order gradient estimators {\it locally unbiased} within the $Θ(ε)$-ball around $y^*(x)$. We study the complexity of finding stationary points with such an $y^*$-aware oracle: we propose a simple first-order method that converges to an $ε$ stationary point using $O(ε^{-6}), O(ε^{-4})$ access to first-order $y^*$-aware oracles. Our upper bounds also apply to standard unbiased first-order oracles, improving the best-known complexity of first-order methods by $O(ε)$ with minimal assumptions. We then provide the matching $Ω(ε^{-6})$, $Ω(ε^{-4})$ lower bounds without and with an additional smoothness assumption on $y^*$-aware oracles, respectively. Our results imply that any approach that simulates an algorithm with an $y^*$-aware oracle must suffer the same lower bounds.

On the Complexity of First-Order Methods in Stochastic Bilevel Optimization

TL;DR

This work analyzes the fundamental complexity of finding stationary points in stochastic bilevel optimization with a strongly convex lower level, under a -aware oracle that provides an -accurate lower-level solution and locally unbiased gradients. It introduces a penalty-based approach that reduces the bilevel problem to a single-level surrogate and leverages inner-outer loop schemes to control bias and variance, achieving complexity without stochastic smoothness and with it, for . The paper also derives matching lower bounds and via probabilistic zero-chains, demonstrating that any algorithm playing against a -aware oracle cannot surpass these rates under the stated assumptions. By connecting the upper and lower bounds, the results reveal that first-order methods, under mild smoothness conditions, can match certain second-order benchmarks in bilevel settings and establish tight complexity barriers for this oracle model. The findings have implications for the design of efficient bilevel optimization algorithms in meta-learning, hyperparameter tuning, and adversarial learning, clarifying when a -aware oracle can yield substantial gains and when intrinsic difficulty limits progress.

Abstract

We consider the problem of finding stationary points in Bilevel optimization when the lower-level problem is unconstrained and strongly convex. The problem has been extensively studied in recent years; the main technical challenge is to keep track of lower-level solutions in response to the changes in the upper-level variables . Subsequently, all existing approaches tie their analyses to a genie algorithm that knows lower-level solutions and, therefore, need not query any points far from them. We consider a dual question to such approaches: suppose we have an oracle, which we call -aware, that returns an -estimate of the lower-level solution, in addition to first-order gradient estimators {\it locally unbiased} within the -ball around . We study the complexity of finding stationary points with such an -aware oracle: we propose a simple first-order method that converges to an stationary point using access to first-order -aware oracles. Our upper bounds also apply to standard unbiased first-order oracles, improving the best-known complexity of first-order methods by with minimal assumptions. We then provide the matching , lower bounds without and with an additional smoothness assumption on -aware oracles, respectively. Our results imply that any approach that simulates an algorithm with an -aware oracle must suffer the same lower bounds.
Paper Structure (58 sections, 29 theorems, 149 equations, 1 algorithm)

This paper contains 58 sections, 29 theorems, 149 equations, 1 algorithm.

Key Result

Theorem 3.1

Suppose Assumptions assumption:nice_functions and assumption:hessian_lipschitz_g hold and let $\lambda = \max\left( \frac{\lambda_0}{\epsilon}, \frac{6 l_{f,0}}{\mu_g r} \right) \asymp \epsilon^{-1}$, $r_{\lambda} = \frac{l_{f,0}}{\mu_g \lambda}$ where $\lambda_0 := \frac{4 l_{f,0} l_{g,1}}{\mu_g^2}

Theorems & Definitions (33)

  • Definition 1.1: $y^*$-Aware Oracle
  • Theorem 3.1
  • Theorem 3.2
  • Lemma 3.3
  • Proposition 3.4
  • Proposition 3.5
  • Definition 4.1
  • Definition 4.2
  • Lemma 4.3
  • Lemma 4.4
  • ...and 23 more