Reduced Jeffries-Matusita distance: A Novel Loss Function to Improve Generalization Performance of Deep Classification Models

Mohammad Lashkari; Amin Gheibi

Reduced Jeffries-Matusita distance: A Novel Loss Function to Improve Generalization Performance of Deep Classification Models

Mohammad Lashkari, Amin Gheibi

TL;DR

A distance called Reduced Jeffries-Matusita is introduced as a loss function for training deep classification models to reduce the over-fitting issue and shows that the new distance measure stabilizes the training process significantly, enhances the generalization ability, and improves the performance of the models in the Accuracy and F1-score metrics, even if the training set size is small.

Abstract

The generalization performance of deep neural networks in classification tasks is a major concern in machine learning research. Despite widespread techniques used to diminish the over-fitting issue such as data augmentation, pseudo-labeling, regularization, and ensemble learning, this performance still needs to be enhanced with other approaches. In recent years, it has been theoretically demonstrated that the loss function characteristics i.e. its Lipschitzness and maximum value affect the generalization performance of deep neural networks which can be utilized as a guidance to propose novel distance measures. In this paper, by analyzing the aforementioned characteristics, we introduce a distance called Reduced Jeffries-Matusita as a loss function for training deep classification models to reduce the over-fitting issue. In our experiments, we evaluate the new loss function in two different problems: image classification in computer vision and node classification in the context of graph learning. The results show that the new distance measure stabilizes the training process significantly, enhances the generalization ability, and improves the performance of the models in the Accuracy and F1-score metrics, even if the training set size is small.

Reduced Jeffries-Matusita distance: A Novel Loss Function to Improve Generalization Performance of Deep Classification Models

TL;DR

Abstract

Paper Structure (18 sections, 11 theorems, 28 equations, 6 figures, 3 tables)

This paper contains 18 sections, 11 theorems, 28 equations, 6 figures, 3 tables.

Introduction
Related Work
Preliminaries
Generalization Bounds
SGD Optimizer
Adam Optimizer
AdamW Optimizer
Loss Functions
Experiments
Image Classification
Problem Formulation
Dataset and Settings
Evaluation
Node Classification
Problem Formulation
...and 3 more sections

Key Result

Theorem 1

akbari2021does Assume SGD runs for $T$ iterations with an annealing learning rate $\eta_t$ to minimize the training error computed on $N$ samples. Let $\ell(\mathrm{\hat{y}}, \mathrm{y})$ be $\gamma$-Lipschitz, $\zeta$-smooth, and convex. Then SGD is $\beta$-uniformly stable and for every $(\mathrm{

Figures (6)

Figure 1: Evaluation in terms of the generalization error estimate and loss values (Model: ResNet50, Optimizer: Adam)
Figure 2: Evaluation in terms of the generalization error estimate and loss values (Model: ResNet50, Optimizer: AdamW)
Figure 3: Evaluation in terms of the generalization error estimate and loss values (Model: VGG16, Optimizer: SGD)
Figure 4: Evaluation in terms of the generalization error estimate and loss values (Model: GCN)
Figure 5: Evaluation in terms of the generalization error estimate and loss values (Model: GraphSAGE)
...and 1 more figures

Theorems & Definitions (17)

Definition 1: Partition
Definition 2: Generalization Error
Definition 3: Lipschitzness
Definition 4: Smoothness
Definition 5: Uniform Stability
Definition 6: BDC
Theorem 1
Theorem 2
Theorem 3
Theorem 4
...and 7 more

Reduced Jeffries-Matusita distance: A Novel Loss Function to Improve Generalization Performance of Deep Classification Models

TL;DR

Abstract

Reduced Jeffries-Matusita distance: A Novel Loss Function to Improve Generalization Performance of Deep Classification Models

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (17)