First-Order Methods for Linearly Constrained Bilevel Optimization

Guy Kornowski; Swati Padmanabhan; Kai Wang; Zhe Zhang; Suvrit Sra

First-Order Methods for Linearly Constrained Bilevel Optimization

Guy Kornowski, Swati Padmanabhan, Kai Wang, Zhe Zhang, Suvrit Sra

TL;DR

The paper tackles constrained bilevel optimization with linear constraints by developing true first-order methods that avoid Hessian computations. For lower levels with linear equality constraints, it proves that the hyperobjective F is smooth and uses a finite-difference hypergradient proxy to achieve ε-stationarity in ~Ŝ(ε^{-2}) gradient calls. For linear inequality constraints, the authors address potential nonsmoothness by formulating inexact zeroth- and gradient-oracle frameworks that converge to Goldstein stationary points in ~Ŝ(d δ^{-1} ε^{-3}) gradient calls, and, with access to the optimal dual variable, obtain dimension-free rates ~Ŝ(δ^{-1} ε^{-4}). The core strategy combines a penalty-based reformulation to approximate the hypergradient with online-to-nonconvex reductions and implementable perturbation-based gradient methods, supported by numerical experiments. Overall, the work significantly advances first-order techniques for constrained bilevel programs and lays groundwork for extending such guarantees beyond linear constraints.

Abstract

Algorithms for bilevel optimization often encounter Hessian computations, which are prohibitive in high dimensions. While recent works offer first-order methods for unconstrained bilevel problems, the constrained setting remains relatively underexplored. We present first-order linearly constrained optimization methods with finite-time hypergradient stationarity guarantees. For linear equality constraints, we attain $ε$-stationarity in $\widetilde{O}(ε^{-2})$ gradient oracle calls, which is nearly-optimal. For linear inequality constraints, we attain $(δ,ε)$-Goldstein stationarity in $\widetilde{O}(d{δ^{-1} ε^{-3}})$ gradient oracle calls, where $d$ is the upper-level dimension. Finally, we obtain for the linear inequality setting dimension-free rates of $\widetilde{O}({δ^{-1} ε^{-4}})$ oracle complexity under the additional assumption of oracle access to the optimal dual variable. Along the way, we develop new nonsmooth nonconvex optimization methods with inexact oracles. We verify these guarantees with preliminary numerical experiments.

First-Order Methods for Linearly Constrained Bilevel Optimization

TL;DR

Abstract

-stationarity in

gradient oracle calls, which is nearly-optimal. For linear inequality constraints, we attain

-Goldstein stationarity in

gradient oracle calls, where

is the upper-level dimension. Finally, we obtain for the linear inequality setting dimension-free rates of

oracle complexity under the additional assumption of oracle access to the optimal dual variable. Along the way, we develop new nonsmooth nonconvex optimization methods with inexact oracles. We verify these guarantees with preliminary numerical experiments.

Paper Structure (34 sections, 32 theorems, 139 equations, 1 figure, 4 algorithms)

This paper contains 34 sections, 32 theorems, 139 equations, 1 figure, 4 algorithms.

Introduction
Our contributions
Related work
Preliminaries
Assumptions
Lower-level problem with linear equality constraint
Main technical ideas
Lower-level problem with linear inequality constraint: nonsmooth nonconvex optimization with inexact oracles
Nonsmooth nonconvex optimization with inexact zeroth-order oracle
Nonsmooth nonconvex optimization with inexact gradient oracle
Implementation-friendly algorithm.
Lower-level problem with linear inequality constraint: constructing the inexact gradient oracle
Reformulation via the penalty method
Main result: approximating the hypergradient
Experiments
...and 19 more sections

Key Result

Theorem 3.1

Consider prob:lin-eq under assumption:linEq_smoothness, and let $\kappa=C_g/\mu_g$ be the condition number of $g$. Then alg:LE-full-alg finds an $\epsilon$-stationary point (in terms of gradient mapping, see eq:gradient-mapping) after $T=\widetilde{O}(C_F (F(x_0)-\inf F)\sqrt{\kappa}\epsilon^{-2})$

Figures (1)

Figure 1: We run \ref{['alg: PIGD']} using \ref{['alg:inexact-gradient-oracle']} on the bilevel optimization in the toy example in \ref{['eqn:experiment-bilevel-optimization']} with $d_x = 100$, $d_y = 200$, $n_{\text{const}} = d_y / 5$, and accuracy $\alpha = 0.1$. \ref{['fig:convergence-comparison']}, \ref{['fig:convergence-comparison-2']}, \ref{['fig:computation-comparison']} vary # of iterations, gradient exactness $\alpha$, and $d_y$, respectively, to compare the performance under different settings.

Theorems & Definitions (60)

Definition 2.1
Theorem 3.1
Lemma 3.1
proof : Proof sketch; see \ref{['sec:appendix_linear_equality']} for the complete proof
Lemma 3.1
proof : Proof sketch; see \ref{['sec:appendix_linear_equality']}
Theorem 4.1
Lemma 4.2
proof : Proof of \ref{['lem:ZeroOrderApprox']}
Lemma 4.2
...and 50 more

First-Order Methods for Linearly Constrained Bilevel Optimization

TL;DR

Abstract

First-Order Methods for Linearly Constrained Bilevel Optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (60)