On the Condition Number Dependency in Bilevel Optimization
Lesi Chen, Jingzhao Zhang
TL;DR
The paper analyzes the oracle complexity of finding ε-stationary points in nonconvex-strongly-convex bilevel problems, establishing a new Ω(κ_y^2 ε^{-2}) lower bound and a near-optimal Õ(κ_y^{7/2} ε^{-2}) upper bound, thereby proving a provable gap between bilevel and minimax problems. It extends these results to higher-order smoothness, stochastic settings, and convex hyper-objectives, deriving several improved lower bounds (e.g., Ω(κ_y^{25/14} ε^{-12/7}) for second-order NC-SC and Ω(κ_y^{5/4}/√ε) for C-SC) and proposing accelerated fully first-order algorithms (F^2BA^+, AccF^2BA^+) that achieve tighter upper bounds by solving the lower-level problem with accelerated gradient descent (AGD). The work demonstrates that NC-SC bilevel optimization can be provably more challenging than NC-SC minimax problems and provides a systematic framework to translate lower-bound constructions across problem settings. Numerical experiments validate the theoretical gains from replacing inner GD with AGD in learn-to-regularize tasks, underscoring practical benefits. Open questions include closing remaining gaps between upper and lower bounds and extending the analysis to broader bilevel formulations and zeroth-order settings.
Abstract
Bilevel optimization minimizes an objective function, defined by an upper-level problem whose feasible region is the solution of a lower-level problem. We study the oracle complexity of finding an $ε$-stationary point with first-order methods when the upper-level problem is nonconvex and the lower-level problem is strongly convex. Recent works (Ji et al., ICML 2021; Arbel and Mairal, ICLR 2022; Chen el al., JMLR 2025) achieve a $\tilde{\mathcal{O}}(κ^4 ε^{-2})$ upper bound that is near-optimal in $ε$. However, the optimal dependency on the condition number $κ$ is unknown. In this work, we establish a new $Ω(κ^2 ε^{-2})$ lower bound and $\tilde{\mathcal{O}}(κ^{7/2} ε^{-2})$ upper bound for this problem, establishing the first provable gap between bilevel problems and minimax problems in this setup. Our lower bounds can be extended to various settings, including high-order smooth functions, stochastic oracles, and convex hyper-objectives: (1) For second-order and arbitrarily smooth problems, we show $Ω(κ_y^{13/4} ε^{-12/7})$ and $Ω(κ^{17/10} ε^{-8/5})$ lower bounds, respectively. (2) For convex-strongly-convex problems, we improve the previously best lower bound (Ji and Liang, JMLR 2022) from $Ω(κ/\sqrtε)$ to $Ω(κ^{5/4} / \sqrtε)$. (3) For smooth stochastic problems, we show an $Ω(κ^4 ε^{-4})$ lower bound.
