Advancing Supervised Learning with the Wave Loss Function: A Robust and Smooth Approach

Mushir Akhtar; M. Tanveer; Mohd. Arshad

Advancing Supervised Learning with the Wave Loss Function: A Robust and Smooth Approach

Mushir Akhtar, M. Tanveer, Mohd. Arshad

TL;DR

This paper incorporates the proposed wave loss function into the least squares setting of support vector machines (SVM) and twin support vector machines (TSVM), resulting in two robust and smooth models termed Wave-SVM and Wave-TSVM, respectively, and devise an iterative algorithm to solve the optimization problems of Wave-TSVM.

Abstract

Loss function plays a vital role in supervised learning frameworks. The selection of the appropriate loss function holds the potential to have a substantial impact on the proficiency attained by the acquired model. The training of supervised learning algorithms inherently adheres to predetermined loss functions during the optimization process. In this paper, we present a novel contribution to the realm of supervised machine learning: an asymmetric loss function named wave loss. It exhibits robustness against outliers, insensitivity to noise, boundedness, and a crucial smoothness property. Theoretically, we establish that the proposed wave loss function manifests the essential characteristic of being classification-calibrated. Leveraging this breakthrough, we incorporate the proposed wave loss function into the least squares setting of support vector machines (SVM) and twin support vector machines (TSVM), resulting in two robust and smooth models termed Wave-SVM and Wave-TSVM, respectively. To address the optimization problem inherent in Wave-SVM, we utilize the adaptive moment estimation (Adam) algorithm. It is noteworthy that this paper marks the first instance of the Adam algorithm application to solve an SVM model. Further, we devise an iterative algorithm to solve the optimization problems of Wave-TSVM. To empirically showcase the effectiveness of the proposed Wave-SVM and Wave-TSVM, we evaluate them on benchmark UCI and KEEL datasets (with and without feature noise) from diverse domains. Moreover, to exemplify the applicability of Wave-SVM in the biomedical domain, we evaluate it on the Alzheimer Disease Neuroimaging Initiative (ADNI) dataset. The experimental outcomes unequivocally reveal the prowess of Wave-SVM and Wave-TSVM in achieving superior prediction accuracy against the baseline models.

Advancing Supervised Learning with the Wave Loss Function: A Robust and Smooth Approach

TL;DR

Abstract

Paper Structure (19 sections, 1 theorem, 37 equations, 3 figures, 7 tables, 2 algorithms)

This paper contains 19 sections, 1 theorem, 37 equations, 3 figures, 7 tables, 2 algorithms.

Introduction and Motivation
Non-smooth loss functions
Smooth loss functions
Proposed work
Wave loss function
Theoretical analysis of the wave loss function
Formulation of Wave-SVM
Adam for linear Wave-SVM
Adam for non-linear Wave-SVM
Formulation of Wave-TSVM
Linear Wave-TSVM
Non-linear Wave-TSVM
Computational Complexity
Numerical Experiments
Experimental Setup and Parameter Selection
...and 4 more sections

Key Result

Theorem 2.1

The proposed loss function $\mathfrak{L}_{wave}(u)$ is classification-calibrated, i.e., $f_{\mathfrak{L}_{wave},\mathsf{P}}$ has the same sign as the Bayes classifier.

Figures (3)

Figure 1: Visual illustration of baseline loss functions. (a) Hinge loss function. (b) Pinball loss function with $\tau=0$ and $\tau=0.2$. (c) Ramp loss function with $\theta=1$. (d) Squared hinge loss function (e) Smooth pinball loss function with $\tau=0.8$ and $\tau=1$. (f) LINEX loss function with $a=0.5$, $a=1$, and $a=1.5$.
Figure 2: Illustration of wave loss function for fixed $\lambda=1$ and different values of $a$. Subfigures (a), (b), (c), and (d) demonstrate that the value of $a$ controls the strength of the penalty for correctly classified and misclassified samples.
Figure 3: Illustrate the plot of $\int_{Y}\mathfrak{L}_{wave}\left(1-yf(x)\right) d\mathsf{P} (y\vert x)$ over $f(x)$ with varying values of $\mathsf{P}(x)$. (a) Depict the case where $\mathsf{P}(x)$ is greater than $1/2$, and (b) illustrate the scenario where $\mathsf{P}(x)$ less than $1/2$.

Theorems & Definitions (2)

Theorem 2.1
proof

Advancing Supervised Learning with the Wave Loss Function: A Robust and Smooth Approach

TL;DR

Abstract

Advancing Supervised Learning with the Wave Loss Function: A Robust and Smooth Approach

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (2)