Table of Contents
Fetching ...

Bridging Constraints and Stochasticity: A Fully First-Order Method for Stochastic Bilevel Optimization with Linear Constraints

Cac Phan, Kai Wang

TL;DR

This paper tackles stochastic bilevel optimization with linear lower-level constraints by introducing F2CSA, a fully first-order method that achieves finite-time convergence to a $(\delta,\epsilon)$-Goldstein stationary point using only gradient information. A stochastic inexact hypergradient is built via a smoothed penalty Lagrangian, with a bias of $O(\alpha)$ and variance $O(1/N_g)$, leading to a total first-order oracle complexity of $\tilde{O}(\delta^{-1}\epsilon^{-5})$ when inner accuracy and batch sizes are tuned. The approach avoids Hessian computations and remains robust to stochastic noise, with experiments showing competitive convergence and favorable scalability in high-dimensional settings compared to Hessian-based baselines. The results establish the first finite-time guarantees for linearly constrained stochastic bilevel optimization in a fully first-order framework and suggest avenues for variance reduction and broader constraint types to close remaining gaps to optimal rates.

Abstract

This work provides the first finite-time convergence guarantees for linearly constrained stochastic bilevel optimization using only first-order methods, requiring solely gradient information without any Hessian computations or second-order derivatives. We address the unprecedented challenge of simultaneously handling linear constraints, stochastic noise, and finite-time analysis in bilevel optimization, a combination that has remained theoretically intractable until now. While existing approaches either require second-order information, handle only unconstrained stochastic problems, or provide merely asymptotic convergence results, our method achieves finite-time guarantees using gradient-based techniques alone. We develop a novel framework that constructs hypergradient approximations via smoothed penalty functions, using approximate primal and dual solutions to overcome the fundamental challenges posed by the interaction between linear constraints and stochastic noise. Our theoretical analysis provides explicit finite-time bounds on the bias and variance of the hypergradient estimator, demonstrating how approximation errors interact with stochastic perturbations. We prove that our first-order algorithm converges to $(δ, ε)$-Goldstein stationary points using $Θ(δ^{-1}ε^{-5})$ stochastic gradient evaluations, establishing the first finite-time complexity result for this challenging problem class and representing a significant theoretical breakthrough in constrained stochastic bilevel optimization.

Bridging Constraints and Stochasticity: A Fully First-Order Method for Stochastic Bilevel Optimization with Linear Constraints

TL;DR

This paper tackles stochastic bilevel optimization with linear lower-level constraints by introducing F2CSA, a fully first-order method that achieves finite-time convergence to a -Goldstein stationary point using only gradient information. A stochastic inexact hypergradient is built via a smoothed penalty Lagrangian, with a bias of and variance , leading to a total first-order oracle complexity of when inner accuracy and batch sizes are tuned. The approach avoids Hessian computations and remains robust to stochastic noise, with experiments showing competitive convergence and favorable scalability in high-dimensional settings compared to Hessian-based baselines. The results establish the first finite-time guarantees for linearly constrained stochastic bilevel optimization in a fully first-order framework and suggest avenues for variance reduction and broader constraint types to close remaining gaps to optimal rates.

Abstract

This work provides the first finite-time convergence guarantees for linearly constrained stochastic bilevel optimization using only first-order methods, requiring solely gradient information without any Hessian computations or second-order derivatives. We address the unprecedented challenge of simultaneously handling linear constraints, stochastic noise, and finite-time analysis in bilevel optimization, a combination that has remained theoretically intractable until now. While existing approaches either require second-order information, handle only unconstrained stochastic problems, or provide merely asymptotic convergence results, our method achieves finite-time guarantees using gradient-based techniques alone. We develop a novel framework that constructs hypergradient approximations via smoothed penalty functions, using approximate primal and dual solutions to overcome the fundamental challenges posed by the interaction between linear constraints and stochastic noise. Our theoretical analysis provides explicit finite-time bounds on the bias and variance of the hypergradient estimator, demonstrating how approximation errors interact with stochastic perturbations. We prove that our first-order algorithm converges to -Goldstein stationary points using stochastic gradient evaluations, establishing the first finite-time complexity result for this challenging problem class and representing a significant theoretical breakthrough in constrained stochastic bilevel optimization.

Paper Structure

This paper contains 25 sections, 15 theorems, 94 equations, 2 figures, 2 algorithms.

Key Result

Lemma 4.1

Assume $\|\tilde{\lambda}(x) - \lambda^*(x)\| \leq C_{\lambda}\delta$ and under Assumption assumption:smoothness (iii), let $\alpha_1 = \alpha^{-2}$, $\alpha_2 = \alpha^{-4}$, and $\tau = \Theta(\delta)$. Then for fixed $(x,y)$:

Figures (2)

  • Figure 1: Loss convergence trajectories for F2CSA, SSIGD, and DSBLO in dimension 50.
  • Figure 2: Computational cost scaling with problem dimension.

Theorems & Definitions (30)

  • Definition 3.1: Goldstein Subdifferential goldstein1977
  • Definition 3.2: $(\delta,\epsilon)$-Goldstein Stationarity
  • Remark 4.1: Inner Loop Complexity
  • Lemma 4.1: Lagrangian Gradient Approximation
  • Lemma 4.2: Solution Error
  • Lemma 4.3: Hypergradient Bias Bound
  • proof : Proof sketch
  • Lemma 4.4: Variance Bound
  • proof : Proof Sketch
  • Theorem 4.1: Accuracy of Stochastic Hypergradient
  • ...and 20 more