Table of Contents
Fetching ...

Reduced Jeffries-Matusita distance: A Novel Loss Function to Improve Generalization Performance of Deep Classification Models

Mohammad Lashkari, Amin Gheibi

TL;DR

A distance called Reduced Jeffries-Matusita is introduced as a loss function for training deep classification models to reduce the over-fitting issue and shows that the new distance measure stabilizes the training process significantly, enhances the generalization ability, and improves the performance of the models in the Accuracy and F1-score metrics, even if the training set size is small.

Abstract

The generalization performance of deep neural networks in classification tasks is a major concern in machine learning research. Despite widespread techniques used to diminish the over-fitting issue such as data augmentation, pseudo-labeling, regularization, and ensemble learning, this performance still needs to be enhanced with other approaches. In recent years, it has been theoretically demonstrated that the loss function characteristics i.e. its Lipschitzness and maximum value affect the generalization performance of deep neural networks which can be utilized as a guidance to propose novel distance measures. In this paper, by analyzing the aforementioned characteristics, we introduce a distance called Reduced Jeffries-Matusita as a loss function for training deep classification models to reduce the over-fitting issue. In our experiments, we evaluate the new loss function in two different problems: image classification in computer vision and node classification in the context of graph learning. The results show that the new distance measure stabilizes the training process significantly, enhances the generalization ability, and improves the performance of the models in the Accuracy and F1-score metrics, even if the training set size is small.

Reduced Jeffries-Matusita distance: A Novel Loss Function to Improve Generalization Performance of Deep Classification Models

TL;DR

A distance called Reduced Jeffries-Matusita is introduced as a loss function for training deep classification models to reduce the over-fitting issue and shows that the new distance measure stabilizes the training process significantly, enhances the generalization ability, and improves the performance of the models in the Accuracy and F1-score metrics, even if the training set size is small.

Abstract

The generalization performance of deep neural networks in classification tasks is a major concern in machine learning research. Despite widespread techniques used to diminish the over-fitting issue such as data augmentation, pseudo-labeling, regularization, and ensemble learning, this performance still needs to be enhanced with other approaches. In recent years, it has been theoretically demonstrated that the loss function characteristics i.e. its Lipschitzness and maximum value affect the generalization performance of deep neural networks which can be utilized as a guidance to propose novel distance measures. In this paper, by analyzing the aforementioned characteristics, we introduce a distance called Reduced Jeffries-Matusita as a loss function for training deep classification models to reduce the over-fitting issue. In our experiments, we evaluate the new loss function in two different problems: image classification in computer vision and node classification in the context of graph learning. The results show that the new distance measure stabilizes the training process significantly, enhances the generalization ability, and improves the performance of the models in the Accuracy and F1-score metrics, even if the training set size is small.
Paper Structure (18 sections, 11 theorems, 28 equations, 6 figures, 3 tables)

This paper contains 18 sections, 11 theorems, 28 equations, 6 figures, 3 tables.

Key Result

Theorem 1

akbari2021does Assume SGD runs for $T$ iterations with an annealing learning rate $\eta_t$ to minimize the training error computed on $N$ samples. Let $\ell(\mathrm{\hat{y}}, \mathrm{y})$ be $\gamma$-Lipschitz, $\zeta$-smooth, and convex. Then SGD is $\beta$-uniformly stable and for every $(\mathrm{

Figures (6)

  • Figure 1: Evaluation in terms of the generalization error estimate and loss values (Model: ResNet50, Optimizer: Adam)
  • Figure 2: Evaluation in terms of the generalization error estimate and loss values (Model: ResNet50, Optimizer: AdamW)
  • Figure 3: Evaluation in terms of the generalization error estimate and loss values (Model: VGG16, Optimizer: SGD)
  • Figure 4: Evaluation in terms of the generalization error estimate and loss values (Model: GCN)
  • Figure 5: Evaluation in terms of the generalization error estimate and loss values (Model: GraphSAGE)
  • ...and 1 more figures

Theorems & Definitions (17)

  • Definition 1: Partition
  • Definition 2: Generalization Error
  • Definition 3: Lipschitzness
  • Definition 4: Smoothness
  • Definition 5: Uniform Stability
  • Definition 6: BDC
  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • ...and 7 more