Table of Contents
Fetching ...

An Augmented Lagrangian Value Function Method for Lower-level Constrained Stochastic Bilevel Optimization

Hantao Nie, Jiaxiang Li, Zaiwen Wen

TL;DR

The paper tackles stochastic lower-level constrained bilevel optimization with nonlinear lower-level constraints by introducing SALVF, a stochastic augmented Lagrangian value-function framework that reformulates BLO into a tractable single-level problem using the Moreau envelope. It provides rigorous equivalence results between BLO and the stochastic single-level reformulation, and delivers non-asymptotic convergence guarantees for both SALVF and its variance-reduced variant SALVF-VR, including explicit sample complexities. The approach is Hessian-free and leverages inner minimax solving with an augmented Lagrangian plus outer penalized SGD, achieving improved rates with variance reduction. Empirical results on synthetic problems and real-world tasks (e.g., SVM hyperparameter tuning and weight decay) demonstrate superior efficiency and accuracy compared to state-of-the-art baselines, highlighting its practical impact for scalable bilevel learning with nonlinear constraints.

Abstract

Recently, lower-level constrained bilevel optimization has attracted increasing attention. However, existing methods mostly focus on either deterministic cases or problems with linear constraints. The main challenge in stochastic cases with general constraints is the bias and variance of the hyper-gradient, arising from the inexact solution of the lower-level problem. In this paper, we propose a novel stochastic augmented Lagrangian value function method for solving stochastic bilevel optimization problems with nonlinear lower-level constraints. Our approach reformulates the original bilevel problem using an augmented Lagrangian-based value function and then applies a penalized stochastic gradient method that carefully manages the noise from stochastic oracles. We establish an equivalence between the stochastic single-level reformulation and the original constrained bilevel problem and provide a non-asymptotic rate of convergence for the proposed method. The rate is further enhanced by employing variance reduction techniques. Extensive experiments on synthetic problems and real-world applications demonstrate the effectiveness of our approach.

An Augmented Lagrangian Value Function Method for Lower-level Constrained Stochastic Bilevel Optimization

TL;DR

The paper tackles stochastic lower-level constrained bilevel optimization with nonlinear lower-level constraints by introducing SALVF, a stochastic augmented Lagrangian value-function framework that reformulates BLO into a tractable single-level problem using the Moreau envelope. It provides rigorous equivalence results between BLO and the stochastic single-level reformulation, and delivers non-asymptotic convergence guarantees for both SALVF and its variance-reduced variant SALVF-VR, including explicit sample complexities. The approach is Hessian-free and leverages inner minimax solving with an augmented Lagrangian plus outer penalized SGD, achieving improved rates with variance reduction. Empirical results on synthetic problems and real-world tasks (e.g., SVM hyperparameter tuning and weight decay) demonstrate superior efficiency and accuracy compared to state-of-the-art baselines, highlighting its practical impact for scalable bilevel learning with nonlinear constraints.

Abstract

Recently, lower-level constrained bilevel optimization has attracted increasing attention. However, existing methods mostly focus on either deterministic cases or problems with linear constraints. The main challenge in stochastic cases with general constraints is the bias and variance of the hyper-gradient, arising from the inexact solution of the lower-level problem. In this paper, we propose a novel stochastic augmented Lagrangian value function method for solving stochastic bilevel optimization problems with nonlinear lower-level constraints. Our approach reformulates the original bilevel problem using an augmented Lagrangian-based value function and then applies a penalized stochastic gradient method that carefully manages the noise from stochastic oracles. We establish an equivalence between the stochastic single-level reformulation and the original constrained bilevel problem and provide a non-asymptotic rate of convergence for the proposed method. The rate is further enhanced by employing variance reduction techniques. Extensive experiments on synthetic problems and real-world applications demonstrate the effectiveness of our approach.

Paper Structure

This paper contains 30 sections, 40 theorems, 209 equations, 3 figures, 1 table, 3 algorithms.

Key Result

Theorem 3.1

Suppose that Assumptions ass: Lipschitz continuityass: convexity and ass: LICQ holds and $\gamma_1, \gamma_2 > 0$ are fixed parameters. 1. Assume $(x^*, y^*)$ is a global solution to eq: BLO and $c_1\geq \frac{L}{ 2 \mu_G} \epsilon^{-1}, c_2 \geq (c_1)^2 B^2\epsilon^{-1}$. There exists $z^* \in \ 2. By taking $c_1 = c_1^* + 2 := \frac{L}{ 2 \mu_G} \epsilon^{-1} + 2, c_2 = c_2^* + 2 := (c_1^*)^

Figures (3)

  • Figure 1: The converged points of Algorithm \ref{['alg: main']}.
  • Figure 2: The performance of SALVF compared with baselines on SVM hyperparameter optimization. The abbreviations "test acc." and "iter." stand for test accuracy and iterations, respectively. The curves are averaged over 10 random seeds. The curves in Figure \ref{['fig: SVM_test_acc_vs_time']}, \ref{['fig: SVM_test_acc_vs_time_fourclass']} are clipped at the maximum iteration 120 and 60, respectively.
  • Figure 3: The performance of SALVF compared with baselines on digit dataset. The curves are averaged over 10 random seeds. The curves in Figure \ref{['fig: weight_decay_test_error_vs_time']} are clipped at the maximum iteration 50.

Theorems & Definitions (48)

  • Remark 3.1
  • Theorem 3.1
  • Theorem 3.2
  • Theorem 3.3
  • Lemma 3.1
  • Theorem 3.4
  • Corollary 3.1
  • Remark 3.2
  • Remark 3.3
  • Theorem 3.5
  • ...and 38 more