Table of Contents
Fetching ...

First-Order Methods for Linearly Constrained Bilevel Optimization

Guy Kornowski, Swati Padmanabhan, Kai Wang, Zhe Zhang, Suvrit Sra

TL;DR

The paper tackles constrained bilevel optimization with linear constraints by developing true first-order methods that avoid Hessian computations. For lower levels with linear equality constraints, it proves that the hyperobjective F is smooth and uses a finite-difference hypergradient proxy to achieve ε-stationarity in ~Ŝ(ε^{-2}) gradient calls. For linear inequality constraints, the authors address potential nonsmoothness by formulating inexact zeroth- and gradient-oracle frameworks that converge to Goldstein stationary points in ~Ŝ(d δ^{-1} ε^{-3}) gradient calls, and, with access to the optimal dual variable, obtain dimension-free rates ~Ŝ(δ^{-1} ε^{-4}). The core strategy combines a penalty-based reformulation to approximate the hypergradient with online-to-nonconvex reductions and implementable perturbation-based gradient methods, supported by numerical experiments. Overall, the work significantly advances first-order techniques for constrained bilevel programs and lays groundwork for extending such guarantees beyond linear constraints.

Abstract

Algorithms for bilevel optimization often encounter Hessian computations, which are prohibitive in high dimensions. While recent works offer first-order methods for unconstrained bilevel problems, the constrained setting remains relatively underexplored. We present first-order linearly constrained optimization methods with finite-time hypergradient stationarity guarantees. For linear equality constraints, we attain $ε$-stationarity in $\widetilde{O}(ε^{-2})$ gradient oracle calls, which is nearly-optimal. For linear inequality constraints, we attain $(δ,ε)$-Goldstein stationarity in $\widetilde{O}(d{δ^{-1} ε^{-3}})$ gradient oracle calls, where $d$ is the upper-level dimension. Finally, we obtain for the linear inequality setting dimension-free rates of $\widetilde{O}({δ^{-1} ε^{-4}})$ oracle complexity under the additional assumption of oracle access to the optimal dual variable. Along the way, we develop new nonsmooth nonconvex optimization methods with inexact oracles. We verify these guarantees with preliminary numerical experiments.

First-Order Methods for Linearly Constrained Bilevel Optimization

TL;DR

The paper tackles constrained bilevel optimization with linear constraints by developing true first-order methods that avoid Hessian computations. For lower levels with linear equality constraints, it proves that the hyperobjective F is smooth and uses a finite-difference hypergradient proxy to achieve ε-stationarity in ~Ŝ(ε^{-2}) gradient calls. For linear inequality constraints, the authors address potential nonsmoothness by formulating inexact zeroth- and gradient-oracle frameworks that converge to Goldstein stationary points in ~Ŝ(d δ^{-1} ε^{-3}) gradient calls, and, with access to the optimal dual variable, obtain dimension-free rates ~Ŝ(δ^{-1} ε^{-4}). The core strategy combines a penalty-based reformulation to approximate the hypergradient with online-to-nonconvex reductions and implementable perturbation-based gradient methods, supported by numerical experiments. Overall, the work significantly advances first-order techniques for constrained bilevel programs and lays groundwork for extending such guarantees beyond linear constraints.

Abstract

Algorithms for bilevel optimization often encounter Hessian computations, which are prohibitive in high dimensions. While recent works offer first-order methods for unconstrained bilevel problems, the constrained setting remains relatively underexplored. We present first-order linearly constrained optimization methods with finite-time hypergradient stationarity guarantees. For linear equality constraints, we attain -stationarity in gradient oracle calls, which is nearly-optimal. For linear inequality constraints, we attain -Goldstein stationarity in gradient oracle calls, where is the upper-level dimension. Finally, we obtain for the linear inequality setting dimension-free rates of oracle complexity under the additional assumption of oracle access to the optimal dual variable. Along the way, we develop new nonsmooth nonconvex optimization methods with inexact oracles. We verify these guarantees with preliminary numerical experiments.
Paper Structure (34 sections, 32 theorems, 139 equations, 1 figure, 4 algorithms)

This paper contains 34 sections, 32 theorems, 139 equations, 1 figure, 4 algorithms.

Key Result

Theorem 3.1

Consider prob:lin-eq under assumption:linEq_smoothness, and let $\kappa=C_g/\mu_g$ be the condition number of $g$. Then alg:LE-full-alg finds an $\epsilon$-stationary point (in terms of gradient mapping, see eq:gradient-mapping) after $T=\widetilde{O}(C_F (F(x_0)-\inf F)\sqrt{\kappa}\epsilon^{-2})$

Figures (1)

  • Figure 1: We run \ref{['alg: PIGD']} using \ref{['alg:inexact-gradient-oracle']} on the bilevel optimization in the toy example in \ref{['eqn:experiment-bilevel-optimization']} with $d_x = 100$, $d_y = 200$, $n_{\text{const}} = d_y / 5$, and accuracy $\alpha = 0.1$. \ref{['fig:convergence-comparison']}, \ref{['fig:convergence-comparison-2']}, \ref{['fig:computation-comparison']} vary # of iterations, gradient exactness $\alpha$, and $d_y$, respectively, to compare the performance under different settings.

Theorems & Definitions (60)

  • Definition 2.1
  • Theorem 3.1
  • Lemma 3.1
  • proof : Proof sketch; see \ref{['sec:appendix_linear_equality']} for the complete proof
  • Lemma 3.1
  • proof : Proof sketch; see \ref{['sec:appendix_linear_equality']}
  • Theorem 4.1
  • Lemma 4.2
  • proof : Proof of \ref{['lem:ZeroOrderApprox']}
  • Lemma 4.2
  • ...and 50 more