Table of Contents
Fetching ...

On the Condition Number Dependency in Bilevel Optimization

Lesi Chen, Jingzhao Zhang

TL;DR

The paper analyzes the oracle complexity of finding ε-stationary points in nonconvex-strongly-convex bilevel problems, establishing a new Ω(κ_y^2 ε^{-2}) lower bound and a near-optimal Õ(κ_y^{7/2} ε^{-2}) upper bound, thereby proving a provable gap between bilevel and minimax problems. It extends these results to higher-order smoothness, stochastic settings, and convex hyper-objectives, deriving several improved lower bounds (e.g., Ω(κ_y^{25/14} ε^{-12/7}) for second-order NC-SC and Ω(κ_y^{5/4}/√ε) for C-SC) and proposing accelerated fully first-order algorithms (F^2BA^+, AccF^2BA^+) that achieve tighter upper bounds by solving the lower-level problem with accelerated gradient descent (AGD). The work demonstrates that NC-SC bilevel optimization can be provably more challenging than NC-SC minimax problems and provides a systematic framework to translate lower-bound constructions across problem settings. Numerical experiments validate the theoretical gains from replacing inner GD with AGD in learn-to-regularize tasks, underscoring practical benefits. Open questions include closing remaining gaps between upper and lower bounds and extending the analysis to broader bilevel formulations and zeroth-order settings.

Abstract

Bilevel optimization minimizes an objective function, defined by an upper-level problem whose feasible region is the solution of a lower-level problem. We study the oracle complexity of finding an $ε$-stationary point with first-order methods when the upper-level problem is nonconvex and the lower-level problem is strongly convex. Recent works (Ji et al., ICML 2021; Arbel and Mairal, ICLR 2022; Chen el al., JMLR 2025) achieve a $\tilde{\mathcal{O}}(κ^4 ε^{-2})$ upper bound that is near-optimal in $ε$. However, the optimal dependency on the condition number $κ$ is unknown. In this work, we establish a new $Ω(κ^2 ε^{-2})$ lower bound and $\tilde{\mathcal{O}}(κ^{7/2} ε^{-2})$ upper bound for this problem, establishing the first provable gap between bilevel problems and minimax problems in this setup. Our lower bounds can be extended to various settings, including high-order smooth functions, stochastic oracles, and convex hyper-objectives: (1) For second-order and arbitrarily smooth problems, we show $Ω(κ_y^{13/4} ε^{-12/7})$ and $Ω(κ^{17/10} ε^{-8/5})$ lower bounds, respectively. (2) For convex-strongly-convex problems, we improve the previously best lower bound (Ji and Liang, JMLR 2022) from $Ω(κ/\sqrtε)$ to $Ω(κ^{5/4} / \sqrtε)$. (3) For smooth stochastic problems, we show an $Ω(κ^4 ε^{-4})$ lower bound.

On the Condition Number Dependency in Bilevel Optimization

TL;DR

The paper analyzes the oracle complexity of finding ε-stationary points in nonconvex-strongly-convex bilevel problems, establishing a new Ω(κ_y^2 ε^{-2}) lower bound and a near-optimal Õ(κ_y^{7/2} ε^{-2}) upper bound, thereby proving a provable gap between bilevel and minimax problems. It extends these results to higher-order smoothness, stochastic settings, and convex hyper-objectives, deriving several improved lower bounds (e.g., Ω(κ_y^{25/14} ε^{-12/7}) for second-order NC-SC and Ω(κ_y^{5/4}/√ε) for C-SC) and proposing accelerated fully first-order algorithms (F^2BA^+, AccF^2BA^+) that achieve tighter upper bounds by solving the lower-level problem with accelerated gradient descent (AGD). The work demonstrates that NC-SC bilevel optimization can be provably more challenging than NC-SC minimax problems and provides a systematic framework to translate lower-bound constructions across problem settings. Numerical experiments validate the theoretical gains from replacing inner GD with AGD in learn-to-regularize tasks, underscoring practical benefits. Open questions include closing remaining gaps between upper and lower bounds and extending the analysis to broader bilevel formulations and zeroth-order settings.

Abstract

Bilevel optimization minimizes an objective function, defined by an upper-level problem whose feasible region is the solution of a lower-level problem. We study the oracle complexity of finding an -stationary point with first-order methods when the upper-level problem is nonconvex and the lower-level problem is strongly convex. Recent works (Ji et al., ICML 2021; Arbel and Mairal, ICLR 2022; Chen el al., JMLR 2025) achieve a upper bound that is near-optimal in . However, the optimal dependency on the condition number is unknown. In this work, we establish a new lower bound and upper bound for this problem, establishing the first provable gap between bilevel problems and minimax problems in this setup. Our lower bounds can be extended to various settings, including high-order smooth functions, stochastic oracles, and convex hyper-objectives: (1) For second-order and arbitrarily smooth problems, we show and lower bounds, respectively. (2) For convex-strongly-convex problems, we improve the previously best lower bound (Ji and Liang, JMLR 2022) from to . (3) For smooth stochastic problems, we show an lower bound.

Paper Structure

This paper contains 31 sections, 19 theorems, 92 equations, 2 figures, 3 tables.

Key Result

Lemma 2.1

If $g$ is a quadratic function of the form (eq:quadratic-g), then ${\mathcal{A}}^{{\rm hvp}} = {\mathcal{A}}^{\rm fo}$.

Figures (2)

  • Figure 1: (a) The construction for NC-SC minimax lower bound li2021complexitywang2024efficient. (b) The construction for our NC-SC bilevel lower bound. An orange node or arrow represents a coordinate or connection in the upper-level, while the black represent the ones in the lower-level. A bold node indicates that a nonconvex regularization is added on the corresponding coordinate.
  • Figure 2: Performances of different algorithms when learning the optimal regularization.

Theorems & Definitions (62)

  • Definition 2.1: $p$th-order smooth NC-SC problem
  • Definition 2.2
  • Definition 2.3: Quadratic lower-level problem
  • Definition 2.4: First-order oracle
  • Definition 2.5: First-order algorithm
  • Definition 2.6: HVP oracle
  • Definition 2.7: HVP-based algorithm
  • Lemma 2.1
  • proof
  • Definition 2.8
  • ...and 52 more