Table of Contents
Fetching ...

A Primal-Dual-Assisted Penalty Approach to Bilevel Optimization with Coupled Constraints

Liuyuan Jiang, Quan Xiao, Victor M. Tenorio, Fernando Real-Rojas, Antonio G. Marques, Tianyi Chen

TL;DR

The paper tackles bilevel optimization with coupled lower-level constraints by introducing a primal-dual-assisted penalty reformulation, enabling a projection-free, fully first-order algorithm named BLOCC. BLOCC solves a smooth outer problem on $F_{\gamma}(x)$ while using an efficient max-min inner solver to compute hypergradients, achieving a nonasymptotic rate of $\tilde{\mathcal{O}}(\epsilon^{-2.5})$ and improving to $\tilde{\mathcal{O}}(\epsilon^{-1.5})$ when the CCs are affine in $y$. The authors establish differentiability and gradient expressions for the penalty objective, prove convergence under LICQ and RSI-type conditions, and present two enhanced results for the affine-CC special case with linear convergence. Empirical results on SVM hyperparameter tuning and large-scale transportation network design demonstrate BLOCC’s scalability and competitiveness against state-of-the-art baselines, highlighting its practical impact for constrained BLO in machine learning and operations research. Overall, BLOCC offers a robust, first-order, projection-free framework for complex BLOs with coupled constraints, enabling efficient solutions to high-dimensional problems with real-world data.

Abstract

Interest in bilevel optimization has grown in recent years, partially due to its applications to tackle challenging machine-learning problems. Several exciting recent works have been centered around developing efficient gradient-based algorithms that can solve bilevel optimization problems with provable guarantees. However, the existing literature mainly focuses on bilevel problems either without constraints, or featuring only simple constraints that do not couple variables across the upper and lower levels, excluding a range of complex applications. Our paper studies this challenging but less explored scenario and develops a (fully) first-order algorithm, which we term BLOCC, to tackle BiLevel Optimization problems with Coupled Constraints. We establish rigorous convergence theory for the proposed algorithm and demonstrate its effectiveness on two well-known real-world applications - hyperparameter selection in support vector machine (SVM) and infrastructure planning in transportation networks using the real data from the city of Seville.

A Primal-Dual-Assisted Penalty Approach to Bilevel Optimization with Coupled Constraints

TL;DR

The paper tackles bilevel optimization with coupled lower-level constraints by introducing a primal-dual-assisted penalty reformulation, enabling a projection-free, fully first-order algorithm named BLOCC. BLOCC solves a smooth outer problem on while using an efficient max-min inner solver to compute hypergradients, achieving a nonasymptotic rate of and improving to when the CCs are affine in . The authors establish differentiability and gradient expressions for the penalty objective, prove convergence under LICQ and RSI-type conditions, and present two enhanced results for the affine-CC special case with linear convergence. Empirical results on SVM hyperparameter tuning and large-scale transportation network design demonstrate BLOCC’s scalability and competitiveness against state-of-the-art baselines, highlighting its practical impact for constrained BLO in machine learning and operations research. Overall, BLOCC offers a robust, first-order, projection-free framework for complex BLOs with coupled constraints, enabling efficient solutions to high-dimensional problems with real-world data.

Abstract

Interest in bilevel optimization has grown in recent years, partially due to its applications to tackle challenging machine-learning problems. Several exciting recent works have been centered around developing efficient gradient-based algorithms that can solve bilevel optimization problems with provable guarantees. However, the existing literature mainly focuses on bilevel problems either without constraints, or featuring only simple constraints that do not couple variables across the upper and lower levels, excluding a range of complex applications. Our paper studies this challenging but less explored scenario and develops a (fully) first-order algorithm, which we term BLOCC, to tackle BiLevel Optimization problems with Coupled Constraints. We establish rigorous convergence theory for the proposed algorithm and demonstrate its effectiveness on two well-known real-world applications - hyperparameter selection in support vector machine (SVM) and infrastructure planning in transportation networks using the real data from the city of Seville.
Paper Structure (42 sections, 20 theorems, 118 equations, 13 figures, 7 tables, 2 algorithms)

This paper contains 42 sections, 20 theorems, 118 equations, 13 figures, 7 tables, 2 algorithms.

Key Result

Lemma 1

Suppose that Assumptions ass: convexity-ass: LICQ hold and $v(x)$ is defined as in eq: v x. Then, it holds that

Figures (13)

  • Figure 1: Calculation of $\nabla v(x)$. The blue line is $v(x)$, the the yellow dashed line is calculated by the formulation given in shen2023penaltykwon2023fully, while red dashed line is derived by our BLOCC. It can be seen that $\nabla v(x)$ without the Lagrange multiplier is in the opposite direction of the true gradient.
  • Figure 2: 3-D plot of the upper-level objective $f(x,y)$ of the toy example, with the line $f(x,y)|_{y=x}$ shown in dashed red and the convergence points marked as red dots.
  • Figure 2: Numerical results on the training outcome of our BLOCC in comparison with LV-HBA yao2024constrained and GAM xu2023efficient. The first row represents accuracy mean $\pm$ standard deviation, and the second row between brackets represents the running time until the upper-level objective's update is smaller than 1e$^{-5}$.
  • Figure 3: Test accuracy (left), upper loss $f(x,y)$ (middle), and lower loss $g(x,y)$ (right) for the SVM on the diabetes dataset. The experiments are executed for 50 different random train-validation-test splits, with the bold line representing the mean, and the shaded regions being the standard deviation.
  • Figure 4: Plots for diabetes dataset
  • ...and 8 more figures

Theorems & Definitions (33)

  • Lemma 1
  • Theorem 1: Equivalence
  • Lemma 2: Danskin-like theorem for $v(x)$
  • Lemma 3: Danskin-like theorem for $F_{\gamma}(x)$
  • Theorem 2
  • Theorem 3
  • Theorem 4: Inner linear convergence
  • Definition 5
  • Remark 1
  • Lemma 4
  • ...and 23 more