Table of Contents
Fetching ...

An Efficient Smoothing Damped Newton Method for Large-Scale Mathematical Programs with Equilibrium Constraints

Yixin Wang, Qingna Li, Liwei Zhang

TL;DR

This work tackles the challenge of large-scale bilevel hyperparameter optimization for $L_1$-SVC by reformulating the problem as a mathematical program with equilibrium constraints (MPEC) and solving it with a package-free smoothing-damped Newton method (SDNM). A Fischer-Burmeister-based smoothing replaces the complementarity constraints, producing a tractable NLP$\_\epsilon$ whose KKT system is solved via a damped Newton method. The authors prove that, under MPEC-LICQ, accumulation points of the smoothing process are C-stationary, and that the subproblem solver enjoys global convergence with quadratic local rate under a second-order condition (Assumption 2). Numerical tests on LIBSVM datasets show SDNM outperforms state-of-the-art approaches in speed and generalization, validating both the theory and the practical impact for large-scale hyperparameter tuning.

Abstract

Bilevel hyperparameter optimization has received growing attention thanks to the fast development of machine learning. Due to the tremendous size of data sets, the scale of bilevel hyperparameter optimization problem could be extremely large, posing great challenges in designing efficient numerical algorithms. In this paper, we focus on solving the large-scale mathematical programs with equilibrium constraints (MPEC) derived from hyperparameter selection of L1 support vector classification (L1-SVC). We propose a highly efficient smoothing damped Newton method (SDNM) for solving such MPEC. Compared with most existing algorithms where subproblems are solved by packages, our approach fully takes advantage of the structure of MPEC and therefore is package-free. Moreover, the proposed SDNM converges to C-stationary point under MPEC-LICQ with subproblem enjoys a quadratic convergence rate under proper assumptions. Extensive numerical results over LIBSVM dataset show the superior performance of SDNM over other state-of-art algorithms.

An Efficient Smoothing Damped Newton Method for Large-Scale Mathematical Programs with Equilibrium Constraints

TL;DR

This work tackles the challenge of large-scale bilevel hyperparameter optimization for -SVC by reformulating the problem as a mathematical program with equilibrium constraints (MPEC) and solving it with a package-free smoothing-damped Newton method (SDNM). A Fischer-Burmeister-based smoothing replaces the complementarity constraints, producing a tractable NLP whose KKT system is solved via a damped Newton method. The authors prove that, under MPEC-LICQ, accumulation points of the smoothing process are C-stationary, and that the subproblem solver enjoys global convergence with quadratic local rate under a second-order condition (Assumption 2). Numerical tests on LIBSVM datasets show SDNM outperforms state-of-the-art approaches in speed and generalization, validating both the theory and the practical impact for large-scale hyperparameter tuning.

Abstract

Bilevel hyperparameter optimization has received growing attention thanks to the fast development of machine learning. Due to the tremendous size of data sets, the scale of bilevel hyperparameter optimization problem could be extremely large, posing great challenges in designing efficient numerical algorithms. In this paper, we focus on solving the large-scale mathematical programs with equilibrium constraints (MPEC) derived from hyperparameter selection of L1 support vector classification (L1-SVC). We propose a highly efficient smoothing damped Newton method (SDNM) for solving such MPEC. Compared with most existing algorithms where subproblems are solved by packages, our approach fully takes advantage of the structure of MPEC and therefore is package-free. Moreover, the proposed SDNM converges to C-stationary point under MPEC-LICQ with subproblem enjoys a quadratic convergence rate under proper assumptions. Extensive numerical results over LIBSVM dataset show the superior performance of SDNM over other state-of-art algorithms.

Paper Structure

This paper contains 14 sections, 12 theorems, 27 equations, 2 figures, 3 tables, 3 algorithms.

Key Result

Theorem 3.1

Let $\{ \epsilon_t \} \searrow 0$ and let $v^{t}$ be a stationary point of (NLP$_{\epsilon_t}$) with $v^*$ be any accumulation point such that MPEC-LICQ holds at $v^*$. If LICQ holds at $v^{t}$ for each $t$, then $v^*$ is a C-stationary point of pb_mpec.

Figures (2)

  • Figure 1: $\log_{10} \|F_\epsilon(r^k)\|_2$ along iterations by SDNM
  • Figure 2: CPU time of three methods

Theorems & Definitions (14)

  • Definition 2.1
  • Definition 2.2
  • Theorem 3.1
  • Theorem 3.2
  • Lemma 4.1
  • Lemma 4.3
  • Theorem 4.4
  • Proposition 5.1
  • Theorem 5.2
  • Proposition 5.3
  • ...and 4 more