A Primal-Dual-Assisted Penalty Approach to Bilevel Optimization with Coupled Constraints

Liuyuan Jiang; Quan Xiao; Victor M. Tenorio; Fernando Real-Rojas; Antonio G. Marques; Tianyi Chen

A Primal-Dual-Assisted Penalty Approach to Bilevel Optimization with Coupled Constraints

Liuyuan Jiang, Quan Xiao, Victor M. Tenorio, Fernando Real-Rojas, Antonio G. Marques, Tianyi Chen

TL;DR

The paper tackles bilevel optimization with coupled lower-level constraints by introducing a primal-dual-assisted penalty reformulation, enabling a projection-free, fully first-order algorithm named BLOCC. BLOCC solves a smooth outer problem on $F_{\gamma}(x)$ while using an efficient max-min inner solver to compute hypergradients, achieving a nonasymptotic rate of $\tilde{\mathcal{O}}(\epsilon^{-2.5})$ and improving to $\tilde{\mathcal{O}}(\epsilon^{-1.5})$ when the CCs are affine in $y$. The authors establish differentiability and gradient expressions for the penalty objective, prove convergence under LICQ and RSI-type conditions, and present two enhanced results for the affine-CC special case with linear convergence. Empirical results on SVM hyperparameter tuning and large-scale transportation network design demonstrate BLOCC’s scalability and competitiveness against state-of-the-art baselines, highlighting its practical impact for constrained BLO in machine learning and operations research. Overall, BLOCC offers a robust, first-order, projection-free framework for complex BLOs with coupled constraints, enabling efficient solutions to high-dimensional problems with real-world data.

Abstract

Interest in bilevel optimization has grown in recent years, partially due to its applications to tackle challenging machine-learning problems. Several exciting recent works have been centered around developing efficient gradient-based algorithms that can solve bilevel optimization problems with provable guarantees. However, the existing literature mainly focuses on bilevel problems either without constraints, or featuring only simple constraints that do not couple variables across the upper and lower levels, excluding a range of complex applications. Our paper studies this challenging but less explored scenario and develops a (fully) first-order algorithm, which we term BLOCC, to tackle BiLevel Optimization problems with Coupled Constraints. We establish rigorous convergence theory for the proposed algorithm and demonstrate its effectiveness on two well-known real-world applications - hyperparameter selection in support vector machine (SVM) and infrastructure planning in transportation networks using the real data from the city of Seville.

A Primal-Dual-Assisted Penalty Approach to Bilevel Optimization with Coupled Constraints

TL;DR

while using an efficient max-min inner solver to compute hypergradients, achieving a nonasymptotic rate of

and improving to

when the CCs are affine in

. The authors establish differentiability and gradient expressions for the penalty objective, prove convergence under LICQ and RSI-type conditions, and present two enhanced results for the affine-CC special case with linear convergence. Empirical results on SVM hyperparameter tuning and large-scale transportation network design demonstrate BLOCC’s scalability and competitiveness against state-of-the-art baselines, highlighting its practical impact for constrained BLO in machine learning and operations research. Overall, BLOCC offers a robust, first-order, projection-free framework for complex BLOs with coupled constraints, enabling efficient solutions to high-dimensional problems with real-world data.

Abstract

Paper Structure (42 sections, 20 theorems, 118 equations, 13 figures, 7 tables, 2 algorithms)

This paper contains 42 sections, 20 theorems, 118 equations, 13 figures, 7 tables, 2 algorithms.

Introduction
Main contributions
Related works
Primal-dual Penalty-based Reformulation
The challenges in BLO with coupled constraints
The Lagrangian duality-based penalty reformulation
Smoothness of the penalty reformulation
Main Results
BLOCC algorithm
MaxMin Solver for the BLO with inequality CCs
Special case of the MaxMin Solver: $g^c(x, y)$ being affine in $y$ and $\mathcal{Y}=\mathbb{R}^{d_y}$
Numerical Experiments
Toy example
Hyperparameter optimization for SVM
Transportation network design problem
...and 27 more sections

Key Result

Lemma 1

Suppose that Assumptions ass: convexity-ass: LICQ hold and $v(x)$ is defined as in eq: v x. Then, it holds that

Figures (13)

Figure 1: Calculation of $\nabla v(x)$. The blue line is $v(x)$, the the yellow dashed line is calculated by the formulation given in shen2023penaltykwon2023fully, while red dashed line is derived by our BLOCC. It can be seen that $\nabla v(x)$ without the Lagrange multiplier is in the opposite direction of the true gradient.
Figure 2: 3-D plot of the upper-level objective $f(x,y)$ of the toy example, with the line $f(x,y)|_{y=x}$ shown in dashed red and the convergence points marked as red dots.
Figure 2: Numerical results on the training outcome of our BLOCC in comparison with LV-HBA yao2024constrained and GAM xu2023efficient. The first row represents accuracy mean $\pm$ standard deviation, and the second row between brackets represents the running time until the upper-level objective's update is smaller than 1e$^{-5}$.
Figure 3: Test accuracy (left), upper loss $f(x,y)$ (middle), and lower loss $g(x,y)$ (right) for the SVM on the diabetes dataset. The experiments are executed for 50 different random train-validation-test splits, with the bold line representing the mean, and the shaded regions being the standard deviation.
Figure 4: Plots for diabetes dataset
...and 8 more figures

Theorems & Definitions (33)

Lemma 1
Theorem 1: Equivalence
Lemma 2: Danskin-like theorem for $v(x)$
Lemma 3: Danskin-like theorem for $F_{\gamma}(x)$
Theorem 2
Theorem 3
Theorem 4: Inner linear convergence
Definition 5
Remark 1
Lemma 4
...and 23 more

A Primal-Dual-Assisted Penalty Approach to Bilevel Optimization with Coupled Constraints

TL;DR

Abstract

A Primal-Dual-Assisted Penalty Approach to Bilevel Optimization with Coupled Constraints

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (13)

Theorems & Definitions (33)