Moreau Envelope Based Difference-of-weakly-Convex Reformulation and Algorithm for Bilevel Programs
Lucy L. Gao, Jane J. Ye, Haian Yin, Shangzhi Zeng, Jin Zhang
TL;DR
This work extends bilevel hyperparameter tuning by replacing the value-function reformulation, which needs convexity in both variables, with a Moreau envelope-based reformulation that only requires convexity in the lower-level problem. The resulting problem is a difference of weakly convex programs, enabling a unified DC framework, and is solved via the inexact proximal Difference of Weakly Convex Algorithm (iP-DwCA). Theoretical results establish weak convexity, Lipschitz properties, and convergence (including KL-based sequential convergence) under standard assumptions, while numerical experiments on elastic net, sparse group lasso, and RBF-SVM demonstrate substantial speedups and competitive accuracy, with early-stopping variants performing particularly well in practice. The approach broadens applicability to hyperparameter tuning in kernel methods and regularized regressions, offering a principled, scalable alternative to grid or random search for complex bilevel models. Overall, the paper contributes a solid methodological advancement and practical algorithmic tool for bilevel optimization in ML hyperparameter selection contexts, supported by rigorous analysis and empirical validation.
Abstract
Bilevel programming has emerged as a valuable tool for hyperparameter selection, a central concern in machine learning. In a recent study by Ye et al. (2023), a value function-based difference of convex algorithm was introduced to address bilevel programs. This approach proves particularly powerful when dealing with scenarios where the lower-level problem exhibits convexity in both the upper-level and lower-level variables. Examples of such scenarios include support vector machines and $\ell_1$ and $\ell_2$ regularized regression. In this paper, we significantly expand the range of applications, now requiring convexity only in the lower-level variables of the lower-level program. We present an innovative single-level difference of weakly convex reformulation based on the Moreau envelope of the lower-level problem. We further develop a sequentially convergent Inexact Proximal Difference of Weakly Convex Algorithm (iP-DwCA). To evaluate the effectiveness of the proposed iP-DwCA, we conduct numerical experiments focused on tuning hyperparameters for kernel support vector machines on simulated data.
