On the Complexity of First-Order Methods in Stochastic Bilevel Optimization
Jeongyeol Kwon, Dohyun Kwon, Hanbaek Lyu
TL;DR
This work analyzes the fundamental complexity of finding stationary points in stochastic bilevel optimization with a strongly convex lower level, under a $y^*(x)$-aware oracle that provides an $O(\epsilon)$-accurate lower-level solution and locally unbiased gradients. It introduces a penalty-based approach that reduces the bilevel problem to a single-level surrogate $\mathcal{L}_{\lambda}^*(x)$ and leverages inner-outer loop schemes to control bias and variance, achieving $O(\epsilon^{-6})$ complexity without stochastic smoothness and $O(\epsilon^{-4})$ with it, for $\lambda=O(\epsilon^{-1})$. The paper also derives matching lower bounds $\Omega(\epsilon^{-6})$ and $\Omega(\epsilon^{-4})$ via probabilistic zero-chains, demonstrating that any algorithm playing against a $y^*(x)$-aware oracle cannot surpass these rates under the stated assumptions. By connecting the upper and lower bounds, the results reveal that first-order methods, under mild smoothness conditions, can match certain second-order benchmarks in bilevel settings and establish tight complexity barriers for this oracle model. The findings have implications for the design of efficient bilevel optimization algorithms in meta-learning, hyperparameter tuning, and adversarial learning, clarifying when a $y^*(x)$-aware oracle can yield substantial gains and when intrinsic difficulty limits progress.
Abstract
We consider the problem of finding stationary points in Bilevel optimization when the lower-level problem is unconstrained and strongly convex. The problem has been extensively studied in recent years; the main technical challenge is to keep track of lower-level solutions $y^*(x)$ in response to the changes in the upper-level variables $x$. Subsequently, all existing approaches tie their analyses to a genie algorithm that knows lower-level solutions and, therefore, need not query any points far from them. We consider a dual question to such approaches: suppose we have an oracle, which we call $y^*$-aware, that returns an $O(ε)$-estimate of the lower-level solution, in addition to first-order gradient estimators {\it locally unbiased} within the $Θ(ε)$-ball around $y^*(x)$. We study the complexity of finding stationary points with such an $y^*$-aware oracle: we propose a simple first-order method that converges to an $ε$ stationary point using $O(ε^{-6}), O(ε^{-4})$ access to first-order $y^*$-aware oracles. Our upper bounds also apply to standard unbiased first-order oracles, improving the best-known complexity of first-order methods by $O(ε)$ with minimal assumptions. We then provide the matching $Ω(ε^{-6})$, $Ω(ε^{-4})$ lower bounds without and with an additional smoothness assumption on $y^*$-aware oracles, respectively. Our results imply that any approach that simulates an algorithm with an $y^*$-aware oracle must suffer the same lower bounds.
