An Augmented Lagrangian Value Function Method for Lower-level Constrained Stochastic Bilevel Optimization
Hantao Nie, Jiaxiang Li, Zaiwen Wen
TL;DR
The paper tackles stochastic lower-level constrained bilevel optimization with nonlinear lower-level constraints by introducing SALVF, a stochastic augmented Lagrangian value-function framework that reformulates BLO into a tractable single-level problem using the Moreau envelope. It provides rigorous equivalence results between BLO and the stochastic single-level reformulation, and delivers non-asymptotic convergence guarantees for both SALVF and its variance-reduced variant SALVF-VR, including explicit sample complexities. The approach is Hessian-free and leverages inner minimax solving with an augmented Lagrangian plus outer penalized SGD, achieving improved rates with variance reduction. Empirical results on synthetic problems and real-world tasks (e.g., SVM hyperparameter tuning and weight decay) demonstrate superior efficiency and accuracy compared to state-of-the-art baselines, highlighting its practical impact for scalable bilevel learning with nonlinear constraints.
Abstract
Recently, lower-level constrained bilevel optimization has attracted increasing attention. However, existing methods mostly focus on either deterministic cases or problems with linear constraints. The main challenge in stochastic cases with general constraints is the bias and variance of the hyper-gradient, arising from the inexact solution of the lower-level problem. In this paper, we propose a novel stochastic augmented Lagrangian value function method for solving stochastic bilevel optimization problems with nonlinear lower-level constraints. Our approach reformulates the original bilevel problem using an augmented Lagrangian-based value function and then applies a penalized stochastic gradient method that carefully manages the noise from stochastic oracles. We establish an equivalence between the stochastic single-level reformulation and the original constrained bilevel problem and provide a non-asymptotic rate of convergence for the proposed method. The rate is further enhanced by employing variance reduction techniques. Extensive experiments on synthetic problems and real-world applications demonstrate the effectiveness of our approach.
