First-Order Methods for Linearly Constrained Bilevel Optimization
Guy Kornowski, Swati Padmanabhan, Kai Wang, Zhe Zhang, Suvrit Sra
TL;DR
The paper tackles constrained bilevel optimization with linear constraints by developing true first-order methods that avoid Hessian computations. For lower levels with linear equality constraints, it proves that the hyperobjective F is smooth and uses a finite-difference hypergradient proxy to achieve ε-stationarity in ~Ŝ(ε^{-2}) gradient calls. For linear inequality constraints, the authors address potential nonsmoothness by formulating inexact zeroth- and gradient-oracle frameworks that converge to Goldstein stationary points in ~Ŝ(d δ^{-1} ε^{-3}) gradient calls, and, with access to the optimal dual variable, obtain dimension-free rates ~Ŝ(δ^{-1} ε^{-4}). The core strategy combines a penalty-based reformulation to approximate the hypergradient with online-to-nonconvex reductions and implementable perturbation-based gradient methods, supported by numerical experiments. Overall, the work significantly advances first-order techniques for constrained bilevel programs and lays groundwork for extending such guarantees beyond linear constraints.
Abstract
Algorithms for bilevel optimization often encounter Hessian computations, which are prohibitive in high dimensions. While recent works offer first-order methods for unconstrained bilevel problems, the constrained setting remains relatively underexplored. We present first-order linearly constrained optimization methods with finite-time hypergradient stationarity guarantees. For linear equality constraints, we attain $ε$-stationarity in $\widetilde{O}(ε^{-2})$ gradient oracle calls, which is nearly-optimal. For linear inequality constraints, we attain $(δ,ε)$-Goldstein stationarity in $\widetilde{O}(d{δ^{-1} ε^{-3}})$ gradient oracle calls, where $d$ is the upper-level dimension. Finally, we obtain for the linear inequality setting dimension-free rates of $\widetilde{O}({δ^{-1} ε^{-4}})$ oracle complexity under the additional assumption of oracle access to the optimal dual variable. Along the way, we develop new nonsmooth nonconvex optimization methods with inexact oracles. We verify these guarantees with preliminary numerical experiments.
