Moreau Envelope Based Difference-of-weakly-Convex Reformulation and Algorithm for Bilevel Programs

Lucy L. Gao; Jane J. Ye; Haian Yin; Shangzhi Zeng; Jin Zhang

Moreau Envelope Based Difference-of-weakly-Convex Reformulation and Algorithm for Bilevel Programs

Lucy L. Gao, Jane J. Ye, Haian Yin, Shangzhi Zeng, Jin Zhang

TL;DR

This work extends bilevel hyperparameter tuning by replacing the value-function reformulation, which needs convexity in both variables, with a Moreau envelope-based reformulation that only requires convexity in the lower-level problem. The resulting problem is a difference of weakly convex programs, enabling a unified DC framework, and is solved via the inexact proximal Difference of Weakly Convex Algorithm (iP-DwCA). Theoretical results establish weak convexity, Lipschitz properties, and convergence (including KL-based sequential convergence) under standard assumptions, while numerical experiments on elastic net, sparse group lasso, and RBF-SVM demonstrate substantial speedups and competitive accuracy, with early-stopping variants performing particularly well in practice. The approach broadens applicability to hyperparameter tuning in kernel methods and regularized regressions, offering a principled, scalable alternative to grid or random search for complex bilevel models. Overall, the paper contributes a solid methodological advancement and practical algorithmic tool for bilevel optimization in ML hyperparameter selection contexts, supported by rigorous analysis and empirical validation.

Abstract

Bilevel programming has emerged as a valuable tool for hyperparameter selection, a central concern in machine learning. In a recent study by Ye et al. (2023), a value function-based difference of convex algorithm was introduced to address bilevel programs. This approach proves particularly powerful when dealing with scenarios where the lower-level problem exhibits convexity in both the upper-level and lower-level variables. Examples of such scenarios include support vector machines and $\ell_1$ and $\ell_2$ regularized regression. In this paper, we significantly expand the range of applications, now requiring convexity only in the lower-level variables of the lower-level program. We present an innovative single-level difference of weakly convex reformulation based on the Moreau envelope of the lower-level problem. We further develop a sequentially convergent Inexact Proximal Difference of Weakly Convex Algorithm (iP-DwCA). To evaluate the effectiveness of the proposed iP-DwCA, we conduct numerical experiments focused on tuning hyperparameters for kernel support vector machines on simulated data.

Moreau Envelope Based Difference-of-weakly-Convex Reformulation and Algorithm for Bilevel Programs

TL;DR

Abstract

and

regularized regression. In this paper, we significantly expand the range of applications, now requiring convexity only in the lower-level variables of the lower-level program. We present an innovative single-level difference of weakly convex reformulation based on the Moreau envelope of the lower-level problem. We further develop a sequentially convergent Inexact Proximal Difference of Weakly Convex Algorithm (iP-DwCA). To evaluate the effectiveness of the proposed iP-DwCA, we conduct numerical experiments focused on tuning hyperparameters for kernel support vector machines on simulated data.

Paper Structure (24 sections, 14 theorems, 112 equations, 3 figures, 5 tables, 2 algorithms)

This paper contains 24 sections, 14 theorems, 112 equations, 3 figures, 5 tables, 2 algorithms.

Introduction
Moreau envelope reformulation of bilevel programs
Moreau envelope function and its proximal mapping
Moreau envelope function reformulation of (BP)
Properties of the Moreau envelope function
Weak convexity
Lipschitz continuity
Sensitivity analysis of the Moreau envelope function
iP-DwCA: Inexact Proximal Difference of Weakly Convex Algorithm
Algorithm design
Algorithm steps
Theoretical Investigations
Inexact proximal DC algorithms for standard DC program
Convergence analysis of iP-DwCA
Bilevel Hyperparameter Tuning Examples
...and 9 more sections

Key Result

Theorem 1

Let $\gamma >0$. Under Assumption 1, ${\rm (VP)}_\gamma$ is equivalent to ${\rm (BP)}$.

Figures (3)

Figure 1: Generated synthetic data points
Figure 2: Decision region of initial point
Figure 3: Decision region of output after 10 iterations.

Theorems & Definitions (20)

Theorem 1
Theorem 2
Theorem 3
Proposition 4: Partial subdifferentiation
Theorem 5
Proposition 6
Definition 7
Definition 8
Proposition 9
Definition 10: Kurdyka-Łojasiewicz property
...and 10 more

Moreau Envelope Based Difference-of-weakly-Convex Reformulation and Algorithm for Bilevel Programs

TL;DR

Abstract

Moreau Envelope Based Difference-of-weakly-Convex Reformulation and Algorithm for Bilevel Programs

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (20)