Table of Contents
Fetching ...

Wasserstein Distributionally Robust Multiclass Support Vector Machine

Michael Ibrahim, Heraldo Rozas, Nagi Gebraeel

TL;DR

This work uses Wasserstein distributionally robust optimization to develop a robust version of the multiclass support vector machine (SVM) characterized by the Crammer-Singer (CS) loss, and demonstrates that the model outperforms state-of-the art OVA models in settings where the training data is highly imbalanced.

Abstract

We study the problem of multiclass classification for settings where data features $\mathbf{x}$ and their labels $\mathbf{y}$ are uncertain. We identify that distributionally robust one-vs-all (OVA) classifiers often struggle in settings with imbalanced data. To address this issue, we use Wasserstein distributionally robust optimization to develop a robust version of the multiclass support vector machine (SVM) characterized by the Crammer-Singer (CS) loss. First, we prove that the CS loss is bounded from above by a Lipschitz continuous function for all $\mathbf{x} \in \mathcal{X}$ and $\mathbf{y} \in \mathcal{Y}$, then we exploit strong duality results to express the dual of the worst-case risk problem, and we show that the worst-case risk minimization problem admits a tractable convex reformulation due to the regularity of the CS loss. Moreover, we develop a kernel version of our proposed model to account for nonlinear class separation, and we show that it admits a tractable convex upper bound. We also propose a projected subgradient method algorithm for a special case of our proposed linear model to improve scalability. Our numerical experiments demonstrate that our model outperforms state-of-the art OVA models in settings where the training data is highly imbalanced. We also show through experiments on popular real-world datasets that our proposed model often outperforms its regularized counterpart as the first accounts for uncertain labels unlike the latter.

Wasserstein Distributionally Robust Multiclass Support Vector Machine

TL;DR

This work uses Wasserstein distributionally robust optimization to develop a robust version of the multiclass support vector machine (SVM) characterized by the Crammer-Singer (CS) loss, and demonstrates that the model outperforms state-of-the art OVA models in settings where the training data is highly imbalanced.

Abstract

We study the problem of multiclass classification for settings where data features and their labels are uncertain. We identify that distributionally robust one-vs-all (OVA) classifiers often struggle in settings with imbalanced data. To address this issue, we use Wasserstein distributionally robust optimization to develop a robust version of the multiclass support vector machine (SVM) characterized by the Crammer-Singer (CS) loss. First, we prove that the CS loss is bounded from above by a Lipschitz continuous function for all and , then we exploit strong duality results to express the dual of the worst-case risk problem, and we show that the worst-case risk minimization problem admits a tractable convex reformulation due to the regularity of the CS loss. Moreover, we develop a kernel version of our proposed model to account for nonlinear class separation, and we show that it admits a tractable convex upper bound. We also propose a projected subgradient method algorithm for a special case of our proposed linear model to improve scalability. Our numerical experiments demonstrate that our model outperforms state-of-the art OVA models in settings where the training data is highly imbalanced. We also show through experiments on popular real-world datasets that our proposed model often outperforms its regularized counterpart as the first accounts for uncertain labels unlike the latter.
Paper Structure (24 sections, 6 theorems, 31 equations, 7 figures, 4 tables, 1 algorithm)

This paper contains 24 sections, 6 theorems, 31 equations, 7 figures, 4 tables, 1 algorithm.

Key Result

Lemma 1

The CS loss $\ell_{\text{CS}}(\mathbf{M};\boldsymbol{\xi}) = \ell_{\text{CS}}(\mathbf{M};(\mathbf{x},\mathbf{y})))$ defined in Def. def:cs_loss possesses the following properties:

Figures (7)

  • Figure 1: Results of the simulation experiments.
  • Figure 2: Surface plots of mCCR vs. $\varepsilon$ and $\kappa$ for the linear WDR-MSVM with 4 classes.
  • Figure 3: Surface plots of mCCR vs. $\varepsilon$ and $\kappa$ for the linear WDR-MSVM with 8 classes.
  • Figure 4: Surface plots of mCCR vs. $\varepsilon$ and $\kappa$ for the linear DR-OVA with 4 classes.
  • Figure 5: Surface plots of mCCR vs. $\varepsilon$ and $\kappa$ for the linear DR-OVA with 8 classes.
  • ...and 2 more figures

Theorems & Definitions (14)

  • Definition 1: cramerSVM
  • Lemma 1
  • Definition 2: kant1960
  • Theorem 1
  • Remark 1
  • Theorem 2
  • Theorem 3
  • Proposition 1
  • Remark 2
  • proof
  • ...and 4 more