On the Condition Number Dependency in Bilevel Optimization

Lesi Chen; Jingzhao Zhang

On the Condition Number Dependency in Bilevel Optimization

Lesi Chen, Jingzhao Zhang

TL;DR

The paper analyzes the oracle complexity of finding ε-stationary points in nonconvex-strongly-convex bilevel problems, establishing a new Ω(κ_y^2 ε^{-2}) lower bound and a near-optimal Õ(κ_y^{7/2} ε^{-2}) upper bound, thereby proving a provable gap between bilevel and minimax problems. It extends these results to higher-order smoothness, stochastic settings, and convex hyper-objectives, deriving several improved lower bounds (e.g., Ω(κ_y^{25/14} ε^{-12/7}) for second-order NC-SC and Ω(κ_y^{5/4}/√ε) for C-SC) and proposing accelerated fully first-order algorithms (F^2BA^+, AccF^2BA^+) that achieve tighter upper bounds by solving the lower-level problem with accelerated gradient descent (AGD). The work demonstrates that NC-SC bilevel optimization can be provably more challenging than NC-SC minimax problems and provides a systematic framework to translate lower-bound constructions across problem settings. Numerical experiments validate the theoretical gains from replacing inner GD with AGD in learn-to-regularize tasks, underscoring practical benefits. Open questions include closing remaining gaps between upper and lower bounds and extending the analysis to broader bilevel formulations and zeroth-order settings.

Abstract

Bilevel optimization minimizes an objective function, defined by an upper-level problem whose feasible region is the solution of a lower-level problem. We study the oracle complexity of finding an $ε$-stationary point with first-order methods when the upper-level problem is nonconvex and the lower-level problem is strongly convex. Recent works (Ji et al., ICML 2021; Arbel and Mairal, ICLR 2022; Chen el al., JMLR 2025) achieve a $\tilde{\mathcal{O}}(κ^4 ε^{-2})$ upper bound that is near-optimal in $ε$. However, the optimal dependency on the condition number $κ$ is unknown. In this work, we establish a new $Ω(κ^2 ε^{-2})$ lower bound and $\tilde{\mathcal{O}}(κ^{7/2} ε^{-2})$ upper bound for this problem, establishing the first provable gap between bilevel problems and minimax problems in this setup. Our lower bounds can be extended to various settings, including high-order smooth functions, stochastic oracles, and convex hyper-objectives: (1) For second-order and arbitrarily smooth problems, we show $Ω(κ_y^{13/4} ε^{-12/7})$ and $Ω(κ^{17/10} ε^{-8/5})$ lower bounds, respectively. (2) For convex-strongly-convex problems, we improve the previously best lower bound (Ji and Liang, JMLR 2022) from $Ω(κ/\sqrtε)$ to $Ω(κ^{5/4} / \sqrtε)$. (3) For smooth stochastic problems, we show an $Ω(κ^4 ε^{-4})$ lower bound.

On the Condition Number Dependency in Bilevel Optimization

TL;DR

Abstract

Bilevel optimization minimizes an objective function, defined by an upper-level problem whose feasible region is the solution of a lower-level problem. We study the oracle complexity of finding an

-stationary point with first-order methods when the upper-level problem is nonconvex and the lower-level problem is strongly convex. Recent works (Ji et al., ICML 2021; Arbel and Mairal, ICLR 2022; Chen el al., JMLR 2025) achieve a

upper bound that is near-optimal in

. However, the optimal dependency on the condition number

is unknown. In this work, we establish a new

lower bound and

upper bound for this problem, establishing the first provable gap between bilevel problems and minimax problems in this setup. Our lower bounds can be extended to various settings, including high-order smooth functions, stochastic oracles, and convex hyper-objectives: (1) For second-order and arbitrarily smooth problems, we show

and

lower bounds, respectively. (2) For convex-strongly-convex problems, we improve the previously best lower bound (Ji and Liang, JMLR 2022) from

. (3) For smooth stochastic problems, we show an

lower bound.

On the Condition Number Dependency in Bilevel Optimization

TL;DR

Abstract

On the Condition Number Dependency in Bilevel Optimization

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (62)