Table of Contents
Fetching ...

A Safe Screening Rule with Bi-level Optimization of $ν$ Support Vector Machine

Zhiji Yang, Wanyi Chen, Huan Zhang, Yitian Xu, Lei Shi, Jianhua Zhao

TL;DR

A safe screening rule with bi-level optimization for $\nu$-SVM (SRBO-$\nu$-SVM) which can screen out inactive samples before training and reduce the computational cost without sacrificing the prediction accuracy is proposed.

Abstract

Support vector machine (SVM) has achieved many successes in machine learning, especially for a small sample problem. As a famous extension of the traditional SVM, the $ν$ support vector machine ($ν$-SVM) has shown outstanding performance due to its great model interpretability. However, it still faces challenges in training overhead for large-scale problems. To address this issue, we propose a safe screening rule with bi-level optimization for $ν$-SVM (SRBO-$ν$-SVM) which can screen out inactive samples before training and reduce the computational cost without sacrificing the prediction accuracy. Our SRBO-$ν$-SVM is strictly deduced by integrating the Karush-Kuhn-Tucker (KKT) conditions, the variational inequalities of convex problems and the $ν$-property. Furthermore, we develop an efficient dual coordinate descent method (DCDM) to further improve computational speed. Finally, a unified framework for SRBO is proposed to accelerate many SVM-type models, and it is successfully applied to one-class SVM. Experimental results on 6 artificial data sets and 30 benchmark data sets have verified the effectiveness and safety of our proposed methods in supervised and unsupervised tasks.

A Safe Screening Rule with Bi-level Optimization of $ν$ Support Vector Machine

TL;DR

A safe screening rule with bi-level optimization for -SVM (SRBO--SVM) which can screen out inactive samples before training and reduce the computational cost without sacrificing the prediction accuracy is proposed.

Abstract

Support vector machine (SVM) has achieved many successes in machine learning, especially for a small sample problem. As a famous extension of the traditional SVM, the support vector machine (-SVM) has shown outstanding performance due to its great model interpretability. However, it still faces challenges in training overhead for large-scale problems. To address this issue, we propose a safe screening rule with bi-level optimization for -SVM (SRBO--SVM) which can screen out inactive samples before training and reduce the computational cost without sacrificing the prediction accuracy. Our SRBO--SVM is strictly deduced by integrating the Karush-Kuhn-Tucker (KKT) conditions, the variational inequalities of convex problems and the -property. Furthermore, we develop an efficient dual coordinate descent method (DCDM) to further improve computational speed. Finally, a unified framework for SRBO is proposed to accelerate many SVM-type models, and it is successfully applied to one-class SVM. Experimental results on 6 artificial data sets and 30 benchmark data sets have verified the effectiveness and safety of our proposed methods in supervised and unsupervised tasks.
Paper Structure (20 sections, 8 theorems, 40 equations, 8 figures, 12 tables, 2 algorithms)

This paper contains 20 sections, 8 theorems, 40 equations, 8 figures, 12 tables, 2 algorithms.

Key Result

Lemma 1

(Variational Inequality) gulerFoundationsOptimization2010 Let $F(\bm{\alpha})$ be a differentiable function on the open set containing the convex set $\mathcal{A}$. When $\bm{\alpha^{*}}$ is the local minimum of $F(\bm{\alpha})$, the following inequality holds Here, $\nabla F(\bm{\alpha^{*}})$ denotes the gradient of function $F$ at $\bm{\alpha^{*}}$.

Figures (8)

  • Figure 1: An illustration of $\nu$-SVM on a 2-D artificial data set. $X^{(1)}$ and $X^{(2)}$ represent two features of the samples, respectively. The black dotted lines are support hyperplanes of two classes. The solid black line represents the decision hyperplane.
  • Figure 2: Illustrations of $\mathcal{W}$ when given $\bm{w_{0}}$. $\mathcal{W}$ is a spherical region with center $c$ and radius $r ^{\frac{1}{2}}$. The difference between (a) and (b) is in the size of the sphere, which is only due to the different selection of $\bm{\delta}$.
  • Figure 3: Mathematical framework of our SRBO-$\nu$-SVM.
  • Figure 4: Classification graphs of SRBO-$\nu$-SVM on three normally distributed data sets (respectively in linear and nonlinear case, $\mu_{\pm}=\pm 1, \pm 2, \pm 5$), nonlinear case on circle, exclusive and spiral data. The blue and red points represent the positive and negative training instances, respectively. In each graph, the black solid line is the decision boundary, and the other two lines are support hyperplanes. Each graph corresponds to the classifier under optimal parameters, and 'Accuracy' represents the corresponding training accuracy. The green points correspond to the samples deleted in $\mathcal{L}$, and the yellow points correspond to the samples removed in $\mathcal{R}$. 'Screening Ratio' is the average result during the whole parameter selection process by SRBO.
  • Figure 5: Speedup Ratio of SRBO-$\nu$-SVM in linear and nonlinear cases.
  • ...and 3 more figures

Theorems & Definitions (11)

  • Lemma 1
  • Theorem 1
  • proof
  • Corollary 1
  • proof
  • Lemma 2
  • Theorem 2
  • proof
  • Corollary 2
  • Corollary 3
  • ...and 1 more