Robust support vector machines via conic optimization

Valentina Cepeda; Andrés Gómez; Shaoning Han

Robust support vector machines via conic optimization

Valentina Cepeda, Andrés Gómez, Shaoning Han

TL;DR

The paper tackles robust SVM learning under label uncertainty by deriving a conic-optimization-based convexification of the $0$-$1$ loss. It introduces a strong convex relaxation through a convex hull construction that yields a separable, non-convex loss $\mathcal{L}^*(u;\gamma)$ and formulates an SDP-based training problem whose outer relaxation optimizes the loss strength via $\gamma$. Computational results show the proposed conic loss matches hinge in clean data but outperforms it in the presence of outliers, with reduced variance and practical runtimes on datasets with up to thousands of samples. The approach scales to moderate feature dimensions, can be extended to kernels, and provides a robust alternative to standard hinge-based SVMs in noisy environments.

Abstract

We consider the problem of learning support vector machines robust to uncertainty. It has been established in the literature that typical loss functions, including the hinge loss, are sensible to data perturbations and outliers, thus performing poorly in the setting considered. In contrast, using the 0-1 loss or a suitable non-convex approximation results in robust estimators, at the expense of large computational costs. In this paper we use mixed-integer optimization techniques to derive a new loss function that better approximates the 0-1 loss compared with existing alternatives, while preserving the convexity of the learning problem. In our computational results, we show that the proposed estimator is competitive with the standard SVMs with the hinge loss in outlier-free regimes and better in the presence of outliers.

Robust support vector machines via conic optimization

TL;DR

The paper tackles robust SVM learning under label uncertainty by deriving a conic-optimization-based convexification of the

loss. It introduces a strong convex relaxation through a convex hull construction that yields a separable, non-convex loss

and formulates an SDP-based training problem whose outer relaxation optimizes the loss strength via

. Computational results show the proposed conic loss matches hinge in clean data but outperforms it in the presence of outliers, with reduced variance and practical runtimes on datasets with up to thousands of samples. The approach scales to moderate feature dimensions, can be extended to kernels, and provides a robust alternative to standard hinge-based SVMs in noisy environments.

Abstract

Paper Structure (16 sections, 4 theorems, 35 equations, 2 figures, 4 tables)

This paper contains 16 sections, 4 theorems, 35 equations, 2 figures, 4 tables.

Introduction
The big-M formulation and hinge loss
Loss functions via convexification
Derivation of the convex hull
Interpretation as regularization
Implementation via conic optimization
Computations
Synthetic instances
Instance generation
Methods, metrics and implementation
Results
Real instances
Conclusions
Kernel formulations
Additional computational results with separable synthetic data and label noise
...and 1 more sections

Key Result

Theorem 1

The convex hull of $W_Q$ is described by bound constraints $0\leq z\leq 1$, and inequality where $\gamma\in \mathbb{R}_+$ is the largest number such that $\bm{w^\top Qw}-\gamma(1-y(\bm{x^\top w}))^2$ is convex, given by $\gamma=1/(\bm{x^\top Q^{-1}x})$.

Figures (2)

Figure 1: Non-convex robust losses approximating the 0-1 loss, as a function of $u=y_i\bm{x_i^\top w}$. Top row: loss functions from the literature: the normalized sigmoid loss mason1999boosting, the $\psi$-learning loss shen2003psi and the ramp loss wu2007robust. Bottom row: the proposed conic loss in Proposition \ref{['prop:loss']} for different values of hyperparameters $\gamma_i$ (with $\lambda=1$) . By solving the conic optimization problem \ref{['eq:primal']}, $\bm{\gamma}$ is chosen automatically to ensure convexity of the ensuing learning problem.
Figure 2: Distribution of out-of-sample misclassification for data with clustered outliers and $\sigma=0.2$, as a function of the number of datapoints $n$. In instances with small $n$ (top row), the hinge estimator has a probability of breaking down, resulting in out-of-sample misclassifications above 50%; the conic loss reduces the average misclassification rate by an order-of-magnitude. Moreover, the conic estimator performs consistently good in all settings.

Theorems & Definitions (9)

Theorem 1
Lemma 1
proof
proof : Proof of Theorem \ref{['theo:convexHull']}
Proposition 1
proof
Theorem 2
proof
Remark 1

Robust support vector machines via conic optimization

TL;DR

Abstract

Robust support vector machines via conic optimization

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (2)

Theorems & Definitions (9)