Table of Contents
Fetching ...

Efficient Penalty-Based Bilevel Methods: Improved Analysis, Novel Updates, and Flatness Condition

Liuyuan Jiang, Quan Xiao, Lisha Chen, Tianyi Chen

TL;DR

This work advances penalty-based BLO by decoupling upper- and lower-level variables through a reformulated outer objective $F_ abla(x)$, enabling constant-step-size gradient-like updates and improved outer-loop efficiency. It introduces ALT-PBGD, an alternating-update method that tightens convergence rates to $O( abla^{-1})$ outer iterations for uncoupled CC BLOs, and PBGD-Free, a fully single-loop, value-function-free algorithm that minimizes inner computation at the cost of a bias term. To address potential divergence of PBGD-Free, the authors propose the $(oldsymbol{ extdelta},oldsymbol{ extalpha})$-flatness condition, which relaxes Lipschitz assumptions and bounds the penalty-induced bias, yielding convergence guarantees to stationary points under mild conditions and for both uncoupled and coupled constraint BLOs. The theory is complemented by empirical results on SVM hyperparameter optimization and large-language-model PEFT, showing that the proposed methods achieve comparable or better accuracy with substantially reduced computational cost, highlighting practical impact for scalable BLO in constrained settings.

Abstract

Penalty-based methods have become popular for solving bilevel optimization (BLO) problems, thanks to their effective first-order nature. However, they often require inner-loop iterations to solve the lower-level (LL) problem and small outer-loop step sizes to handle the increased smoothness induced by large penalty terms, leading to suboptimal complexity. This work considers the general BLO problems with coupled constraints (CCs) and leverages a novel penalty reformulation that decouples the upper- and lower-level variables. This yields an improved analysis of the smoothness constant, enabling larger step sizes and reduced iteration complexity for Penalty-Based Gradient Descent algorithms in ALTernating fashion (ALT-PBGD). Building on the insight of reduced smoothness, we propose PBGD-Free, a novel fully single-loop algorithm that avoids inner loops for the uncoupled constraint BLO. For BLO with CCs, PBGD-Free employs an efficient inner-loop with substantially reduced iteration complexity. Furthermore, we propose a novel curvature condition describing the "flatness" of the upper-level objective with respect to the LL variable. This condition relaxes the traditional upper-level Lipschitz requirement, enables smaller penalty constant choices, and results in a negligible penalty gradient term during upper-level variable updates. We provide rigorous convergence analysis and validate the method's efficacy through hyperparameter optimization for support vector machines and fine-tuning of large language models.

Efficient Penalty-Based Bilevel Methods: Improved Analysis, Novel Updates, and Flatness Condition

TL;DR

This work advances penalty-based BLO by decoupling upper- and lower-level variables through a reformulated outer objective , enabling constant-step-size gradient-like updates and improved outer-loop efficiency. It introduces ALT-PBGD, an alternating-update method that tightens convergence rates to outer iterations for uncoupled CC BLOs, and PBGD-Free, a fully single-loop, value-function-free algorithm that minimizes inner computation at the cost of a bias term. To address potential divergence of PBGD-Free, the authors propose the -flatness condition, which relaxes Lipschitz assumptions and bounds the penalty-induced bias, yielding convergence guarantees to stationary points under mild conditions and for both uncoupled and coupled constraint BLOs. The theory is complemented by empirical results on SVM hyperparameter optimization and large-language-model PEFT, showing that the proposed methods achieve comparable or better accuracy with substantially reduced computational cost, highlighting practical impact for scalable BLO in constrained settings.

Abstract

Penalty-based methods have become popular for solving bilevel optimization (BLO) problems, thanks to their effective first-order nature. However, they often require inner-loop iterations to solve the lower-level (LL) problem and small outer-loop step sizes to handle the increased smoothness induced by large penalty terms, leading to suboptimal complexity. This work considers the general BLO problems with coupled constraints (CCs) and leverages a novel penalty reformulation that decouples the upper- and lower-level variables. This yields an improved analysis of the smoothness constant, enabling larger step sizes and reduced iteration complexity for Penalty-Based Gradient Descent algorithms in ALTernating fashion (ALT-PBGD). Building on the insight of reduced smoothness, we propose PBGD-Free, a novel fully single-loop algorithm that avoids inner loops for the uncoupled constraint BLO. For BLO with CCs, PBGD-Free employs an efficient inner-loop with substantially reduced iteration complexity. Furthermore, we propose a novel curvature condition describing the "flatness" of the upper-level objective with respect to the LL variable. This condition relaxes the traditional upper-level Lipschitz requirement, enables smaller penalty constant choices, and results in a negligible penalty gradient term during upper-level variable updates. We provide rigorous convergence analysis and validate the method's efficacy through hyperparameter optimization for support vector machines and fine-tuning of large language models.

Paper Structure

This paper contains 10 sections, 10 theorems, 69 equations, 5 figures, 2 tables, 3 algorithms.

Key Result

lemma \@thmcounterlemma

If $h:\mathcal{Q} \rightarrow \mathbb{R}$ is $\mu_g$-strongly convex on $\mathcal{S}\subseteq \mathcal{Q}$ and $\mathcal{S}$ is a convex set, then $h$ satisfies proximal $\mu_g$-PL condition on $\mathcal{S}$.

Figures (5)

  • Figure 1: Example of non-differentiable $\phi(x)$ due to the constraint set $\mathcal{Y}(x)$. Consider $f(x,y)=y$, $g(x,y)=(x-y)^2$. For unconstrained$\mathcal{Y}(x)=\mathbb{R}$, $\phi(x)=x$ is differentiable. For uncoupled constrained$\mathcal{Y}(x)=[-1/2,1/2]$, $\phi(x)$ is not differentiable at $x=\pm 1/2$; For coupled constrained$\mathcal{Y}(x)=\{y: y \le x/2\}$, $\phi(x)$ is not differentiable at $x=0$.
  • Figure 2: Comparison of V-PBGD, F$^2$SA, and PBGD-Free updates in the uncoupled constraint setting. V-PBGD shen2023penalty (top) is a joint-PBGD method for \ref{['eq: joint penalty problem']}, and F$^2$SA kwon2023penalty (middle) is an alternating-PBGD method for \ref{['eq: F gam function']}. Both refine the LL variable through multiple inner iterations before updating $x_t$. In contrast, PBGD-Free (bottom) performs only a single-step update from $y_t$ to $y_{t+1}$, providing a more efficient but potentially less accurate approximation $\nabla_x f(x_t,y_{t+1})\approx\nabla F_\gamma(x_t)$.
  • Figure 3: $\nabla_x \tilde{F}_\gamma(x, y)$ (spheres) vs. $\nabla F_\gamma(x)$ (lines) with different $\gamma$ for the problem in Examlpe \ref{['example:toy_example_1']}.
  • Figure 4: Implicit gradient on the boundary. The LL objective is given by $g(x,y)=xy_1+(1+x)y_2^2-y_2$, with LL domain $\mathcal{Y}$ being the unit ball. The orange stars depict the trajectory of $y_g^*(x)$ as $x$ varies from $1$ to $3$.
  • Figure 5: An Illustration to show PBGD-Free does not work in Example \ref{['example:bias']}, but works well in PEFT. The left plot shows the $f(x,y)$ and $f(x, y^*_g(x))$ in Example \ref{['example:bias']}, with red and blue dots as the converged points using PBGD-Free and F$^2$SA method. The middle plot shows the trajectory of updates in PEFT. The orange, blue, and green contours are the landscapes of $f_{\text{DPO}}(x,y)$, $g_{\text{SFT}}(x,y)$, and $\tilde{F}_\gamma (x,y)$, respectively. The right plot presents the convergence vs. time in PEFT, showing faster convergence of PBGD-Free. (See Sec. \ref{['app:toy_example']} for details.)

Theorems & Definitions (29)

  • definition \@thmcounterdefinition: Lipschitz Continuity and Smoothness
  • definition \@thmcounterdefinition: Convexity and Strong Convexity
  • definition \@thmcounterdefinition: Projection
  • definition \@thmcounterdefinition: Directional Derivative bonnans2013perturbation
  • definition \@thmcounterdefinition: Tangent Cone & Critical Cone
  • remark \@thmcounterremark
  • definition \@thmcounterdefinition: Generalized (Proximal) Gradient
  • definition \@thmcounterdefinition: Proximal PL, EB, and QG
  • lemma \@thmcounterlemma: Strongly Convexity and Proximal PL karimi2016linear
  • lemma \@thmcounterlemma: Equivalence of PL, EB, and QG condition liao2024error
  • ...and 19 more