Table of Contents
Fetching ...

Constrained Bi-Level Optimization: Proximal Lagrangian Value function Approach and Hessian-free Algorithm

Wei Yao, Chengming Yu, Shangzhi Zeng, Jin Zhang

TL;DR

The paper tackles constrained BLO where LL constraints couple UL and LL variables by introducing a smooth proximal Lagrangian value function $v_{\gamma}(x,y,z)$, enabling a single-level reformulation with smooth constraints. It then develops LV-HBA, a Hessian-free, single-loop gradient algorithm that alternates a proximal GDA step for the LL min-max problem with projected gradient steps for the upper-level variables, leveraging explicit gradient expressions of $v_{\gamma,r}$. The authors provide non-asymptotic convergence analysis under relatively mild smoothness and convexity assumptions, showing explicit rates without requiring LL strong convexity and accommodating non-singleton LL solutions. Empirical results on synthetic BLOs, SVM hyperparameter tuning, and federated learning demonstrate LV-HBA’s superior practical performance, reduced Hessian computations, and favorable convergence compared to existing Hessian-free methods. The work offers a scalable framework for constrained BLOs with coupled LL constraints and paves the way for stochastic and optimization-accelerating extensions in large-scale ML settings.

Abstract

This paper presents a new approach and algorithm for solving a class of constrained Bi-Level Optimization (BLO) problems in which the lower-level problem involves constraints coupling both upper-level and lower-level variables. Such problems have recently gained significant attention due to their broad applicability in machine learning. However, conventional gradient-based methods unavoidably rely on computationally intensive calculations related to the Hessian matrix. To address this challenge, we begin by devising a smooth proximal Lagrangian value function to handle the constrained lower-level problem. Utilizing this construct, we introduce a single-level reformulation for constrained BLOs that transforms the original BLO problem into an equivalent optimization problem with smooth constraints. Enabled by this reformulation, we develop a Hessian-free gradient-based algorithm-termed proximal Lagrangian Value function-based Hessian-free Bi-level Algorithm (LV-HBA)-that is straightforward to implement in a single loop manner. Consequently, LV-HBA is especially well-suited for machine learning applications. Furthermore, we offer non-asymptotic convergence analysis for LV-HBA, eliminating the need for traditional strong convexity assumptions for the lower-level problem while also being capable of accommodating non-singleton scenarios. Empirical results substantiate the algorithm's superior practical performance.

Constrained Bi-Level Optimization: Proximal Lagrangian Value function Approach and Hessian-free Algorithm

TL;DR

The paper tackles constrained BLO where LL constraints couple UL and LL variables by introducing a smooth proximal Lagrangian value function , enabling a single-level reformulation with smooth constraints. It then develops LV-HBA, a Hessian-free, single-loop gradient algorithm that alternates a proximal GDA step for the LL min-max problem with projected gradient steps for the upper-level variables, leveraging explicit gradient expressions of . The authors provide non-asymptotic convergence analysis under relatively mild smoothness and convexity assumptions, showing explicit rates without requiring LL strong convexity and accommodating non-singleton LL solutions. Empirical results on synthetic BLOs, SVM hyperparameter tuning, and federated learning demonstrate LV-HBA’s superior practical performance, reduced Hessian computations, and favorable convergence compared to existing Hessian-free methods. The work offers a scalable framework for constrained BLOs with coupled LL constraints and paves the way for stochastic and optimization-accelerating extensions in large-scale ML settings.

Abstract

This paper presents a new approach and algorithm for solving a class of constrained Bi-Level Optimization (BLO) problems in which the lower-level problem involves constraints coupling both upper-level and lower-level variables. Such problems have recently gained significant attention due to their broad applicability in machine learning. However, conventional gradient-based methods unavoidably rely on computationally intensive calculations related to the Hessian matrix. To address this challenge, we begin by devising a smooth proximal Lagrangian value function to handle the constrained lower-level problem. Utilizing this construct, we introduce a single-level reformulation for constrained BLOs that transforms the original BLO problem into an equivalent optimization problem with smooth constraints. Enabled by this reformulation, we develop a Hessian-free gradient-based algorithm-termed proximal Lagrangian Value function-based Hessian-free Bi-level Algorithm (LV-HBA)-that is straightforward to implement in a single loop manner. Consequently, LV-HBA is especially well-suited for machine learning applications. Furthermore, we offer non-asymptotic convergence analysis for LV-HBA, eliminating the need for traditional strong convexity assumptions for the lower-level problem while also being capable of accommodating non-singleton scenarios. Empirical results substantiate the algorithm's superior practical performance.
Paper Structure (25 sections, 12 theorems, 140 equations, 4 figures, 3 tables, 1 algorithm)

This paper contains 25 sections, 12 theorems, 140 equations, 4 figures, 3 tables, 1 algorithm.

Key Result

Lemma 3.1

Under Assumptions assump-UL, assump-LL and assump-LLC, let $\gamma_1 \in (0, 1/\rho_f )$, $\gamma_2 >0$, $c_{k+1} \ge c_k$ and $\eta_k \in (\underline{\eta}, \rho_T/L_B^2)$ with $\underline{\eta} > 0$, $\rho_T:= \min\{ 1/\gamma_1 - \rho_f, 1/\gamma_2 \}$ and $L_B := \max\{ L_f + L_g + C_ZL_{g_2} + 1

Figures (4)

  • Figure 1: Comparison between AiPOD, E-AiPOD and LV-HBA on LL merely convex synthetic problem. Left two figures: initial point $10 \cdot \mathbf{1} \in \mathbb{R}^{300}$. Right two figures: initial point $100 \cdot \mathbf{1} \in \mathbb{R}^{300}$.
  • Figure 2: Left two figures: Impact of $p$ in parameter $c_k$ for LV-HBA. Rightmost figure: Time taken to achieve a specified accuracy v.s. dimension for LV-HBA.
  • Figure 3: Comparison between AiPOD, E-AiPOD and LV-HBA on BLO with LL strongly convex objective. Left: initial point $5(\mathbf{1}, \mathbf{1})$ in $\mathbb{R}^{200}$. Right: initial point $10(\mathbf{1}, \mathbf{1})$ in $\mathbb{R}^{200}$.
  • Figure 4: Left: accuracy v.s. running time in hyperparameter optimization of SVM on diabetes; Middle: accuracy v.s. running time in data hyper-cleaning; Right: accuracy v.s. communication round in federated loss function tuning problem.

Theorems & Definitions (24)

  • Lemma 3.1
  • Theorem 3.1
  • Theorem A.1
  • proof
  • Remark A.1
  • Theorem A.2
  • proof
  • Lemma A.1
  • proof
  • Remark A.2
  • ...and 14 more