Wasserstein Distributionally Robust Multiclass Support Vector Machine

Michael Ibrahim; Heraldo Rozas; Nagi Gebraeel

Wasserstein Distributionally Robust Multiclass Support Vector Machine

Michael Ibrahim, Heraldo Rozas, Nagi Gebraeel

TL;DR

This work uses Wasserstein distributionally robust optimization to develop a robust version of the multiclass support vector machine (SVM) characterized by the Crammer-Singer (CS) loss, and demonstrates that the model outperforms state-of-the art OVA models in settings where the training data is highly imbalanced.

Abstract

We study the problem of multiclass classification for settings where data features $\mathbf{x}$ and their labels $\mathbf{y}$ are uncertain. We identify that distributionally robust one-vs-all (OVA) classifiers often struggle in settings with imbalanced data. To address this issue, we use Wasserstein distributionally robust optimization to develop a robust version of the multiclass support vector machine (SVM) characterized by the Crammer-Singer (CS) loss. First, we prove that the CS loss is bounded from above by a Lipschitz continuous function for all $\mathbf{x} \in \mathcal{X}$ and $\mathbf{y} \in \mathcal{Y}$, then we exploit strong duality results to express the dual of the worst-case risk problem, and we show that the worst-case risk minimization problem admits a tractable convex reformulation due to the regularity of the CS loss. Moreover, we develop a kernel version of our proposed model to account for nonlinear class separation, and we show that it admits a tractable convex upper bound. We also propose a projected subgradient method algorithm for a special case of our proposed linear model to improve scalability. Our numerical experiments demonstrate that our model outperforms state-of-the art OVA models in settings where the training data is highly imbalanced. We also show through experiments on popular real-world datasets that our proposed model often outperforms its regularized counterpart as the first accounts for uncertain labels unlike the latter.

Wasserstein Distributionally Robust Multiclass Support Vector Machine

TL;DR

Abstract

We study the problem of multiclass classification for settings where data features

and their labels

are uncertain. We identify that distributionally robust one-vs-all (OVA) classifiers often struggle in settings with imbalanced data. To address this issue, we use Wasserstein distributionally robust optimization to develop a robust version of the multiclass support vector machine (SVM) characterized by the Crammer-Singer (CS) loss. First, we prove that the CS loss is bounded from above by a Lipschitz continuous function for all

and

, then we exploit strong duality results to express the dual of the worst-case risk problem, and we show that the worst-case risk minimization problem admits a tractable convex reformulation due to the regularity of the CS loss. Moreover, we develop a kernel version of our proposed model to account for nonlinear class separation, and we show that it admits a tractable convex upper bound. We also propose a projected subgradient method algorithm for a special case of our proposed linear model to improve scalability. Our numerical experiments demonstrate that our model outperforms state-of-the art OVA models in settings where the training data is highly imbalanced. We also show through experiments on popular real-world datasets that our proposed model often outperforms its regularized counterpart as the first accounts for uncertain labels unlike the latter.

Paper Structure (24 sections, 6 theorems, 31 equations, 7 figures, 4 tables, 1 algorithm)

This paper contains 24 sections, 6 theorems, 31 equations, 7 figures, 4 tables, 1 algorithm.

Introduction
Related Work
Problem Setup and Preliminaries
Wasserstein Distributionally Robust Optimization:
Wasserstein Distributionally Robust Multiclass SVM
Numerical Experiments
Experiment 1: Simulation Sensitivity Analysis
Experiment 2: Real-World Experiment
Conclusions and Future Work
Appendix A: Proofs of Theoretical Results
Proof of Lemma \ref{['lem:prop_loss']}
Proof of Theorem \ref{['thm:linear_DRMSVM']}
Proof of Theorem \ref{['thm:kernel_DRMSVM']}
Proof of Theorem \ref{['thm:alg']}
Proof of Proposition \ref{['prop:conv']}
...and 9 more sections

Key Result

Lemma 1

The CS loss $\ell_{\text{CS}}(\mathbf{M};\boldsymbol{\xi}) = \ell_{\text{CS}}(\mathbf{M};(\mathbf{x},\mathbf{y})))$ defined in Def. def:cs_loss possesses the following properties:

Figures (7)

Figure 1: Results of the simulation experiments.
Figure 2: Surface plots of mCCR vs. $\varepsilon$ and $\kappa$ for the linear WDR-MSVM with 4 classes.
Figure 3: Surface plots of mCCR vs. $\varepsilon$ and $\kappa$ for the linear WDR-MSVM with 8 classes.
Figure 4: Surface plots of mCCR vs. $\varepsilon$ and $\kappa$ for the linear DR-OVA with 4 classes.
Figure 5: Surface plots of mCCR vs. $\varepsilon$ and $\kappa$ for the linear DR-OVA with 8 classes.
...and 2 more figures

Theorems & Definitions (14)

Definition 1: cramerSVM
Lemma 1
Definition 2: kant1960
Theorem 1
Remark 1
Theorem 2
Theorem 3
Proposition 1
Remark 2
proof
...and 4 more

Wasserstein Distributionally Robust Multiclass Support Vector Machine

TL;DR

Abstract

Wasserstein Distributionally Robust Multiclass Support Vector Machine

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (7)

Theorems & Definitions (14)