Table of Contents
Fetching ...

Moreau Envelope Based Difference-of-weakly-Convex Reformulation and Algorithm for Bilevel Programs

Lucy L. Gao, Jane J. Ye, Haian Yin, Shangzhi Zeng, Jin Zhang

TL;DR

This work extends bilevel hyperparameter tuning by replacing the value-function reformulation, which needs convexity in both variables, with a Moreau envelope-based reformulation that only requires convexity in the lower-level problem. The resulting problem is a difference of weakly convex programs, enabling a unified DC framework, and is solved via the inexact proximal Difference of Weakly Convex Algorithm (iP-DwCA). Theoretical results establish weak convexity, Lipschitz properties, and convergence (including KL-based sequential convergence) under standard assumptions, while numerical experiments on elastic net, sparse group lasso, and RBF-SVM demonstrate substantial speedups and competitive accuracy, with early-stopping variants performing particularly well in practice. The approach broadens applicability to hyperparameter tuning in kernel methods and regularized regressions, offering a principled, scalable alternative to grid or random search for complex bilevel models. Overall, the paper contributes a solid methodological advancement and practical algorithmic tool for bilevel optimization in ML hyperparameter selection contexts, supported by rigorous analysis and empirical validation.

Abstract

Bilevel programming has emerged as a valuable tool for hyperparameter selection, a central concern in machine learning. In a recent study by Ye et al. (2023), a value function-based difference of convex algorithm was introduced to address bilevel programs. This approach proves particularly powerful when dealing with scenarios where the lower-level problem exhibits convexity in both the upper-level and lower-level variables. Examples of such scenarios include support vector machines and $\ell_1$ and $\ell_2$ regularized regression. In this paper, we significantly expand the range of applications, now requiring convexity only in the lower-level variables of the lower-level program. We present an innovative single-level difference of weakly convex reformulation based on the Moreau envelope of the lower-level problem. We further develop a sequentially convergent Inexact Proximal Difference of Weakly Convex Algorithm (iP-DwCA). To evaluate the effectiveness of the proposed iP-DwCA, we conduct numerical experiments focused on tuning hyperparameters for kernel support vector machines on simulated data.

Moreau Envelope Based Difference-of-weakly-Convex Reformulation and Algorithm for Bilevel Programs

TL;DR

This work extends bilevel hyperparameter tuning by replacing the value-function reformulation, which needs convexity in both variables, with a Moreau envelope-based reformulation that only requires convexity in the lower-level problem. The resulting problem is a difference of weakly convex programs, enabling a unified DC framework, and is solved via the inexact proximal Difference of Weakly Convex Algorithm (iP-DwCA). Theoretical results establish weak convexity, Lipschitz properties, and convergence (including KL-based sequential convergence) under standard assumptions, while numerical experiments on elastic net, sparse group lasso, and RBF-SVM demonstrate substantial speedups and competitive accuracy, with early-stopping variants performing particularly well in practice. The approach broadens applicability to hyperparameter tuning in kernel methods and regularized regressions, offering a principled, scalable alternative to grid or random search for complex bilevel models. Overall, the paper contributes a solid methodological advancement and practical algorithmic tool for bilevel optimization in ML hyperparameter selection contexts, supported by rigorous analysis and empirical validation.

Abstract

Bilevel programming has emerged as a valuable tool for hyperparameter selection, a central concern in machine learning. In a recent study by Ye et al. (2023), a value function-based difference of convex algorithm was introduced to address bilevel programs. This approach proves particularly powerful when dealing with scenarios where the lower-level problem exhibits convexity in both the upper-level and lower-level variables. Examples of such scenarios include support vector machines and and regularized regression. In this paper, we significantly expand the range of applications, now requiring convexity only in the lower-level variables of the lower-level program. We present an innovative single-level difference of weakly convex reformulation based on the Moreau envelope of the lower-level problem. We further develop a sequentially convergent Inexact Proximal Difference of Weakly Convex Algorithm (iP-DwCA). To evaluate the effectiveness of the proposed iP-DwCA, we conduct numerical experiments focused on tuning hyperparameters for kernel support vector machines on simulated data.
Paper Structure (24 sections, 14 theorems, 112 equations, 3 figures, 5 tables, 2 algorithms)

This paper contains 24 sections, 14 theorems, 112 equations, 3 figures, 5 tables, 2 algorithms.

Key Result

Theorem 1

Let $\gamma >0$. Under Assumption 1, ${\rm (VP)}_\gamma$ is equivalent to ${\rm (BP)}$.

Figures (3)

  • Figure 1: Generated synthetic data points
  • Figure 2: Decision region of initial point
  • Figure 3: Decision region of output after 10 iterations.

Theorems & Definitions (20)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Proposition 4: Partial subdifferentiation
  • Theorem 5
  • Proposition 6
  • Definition 7
  • Definition 8
  • Proposition 9
  • Definition 10: Kurdyka-Łojasiewicz property
  • ...and 10 more