Table of Contents
Fetching ...

Benign Overfitting under Learning Rate Conditions for $α$ Sub-exponential Input

Kota Okudo, Kei Kobayashi

TL;DR

It is shown that under certain conditions on the dimensionality $p$ and the distance between the centers of the distributions, the misclassification error of the maximum margin classifier asymptotically approaches the noise level, the theoretical optimal value.

Abstract

This paper investigates the phenomenon of benign overfitting in binary classification problems with heavy-tailed input distributions, extending the analysis of maximum margin classifiers to $α$ sub-exponential distributions ($α\in (0, 2]$). This generalizes previous work focused on sub-gaussian inputs. We provide generalization error bounds for linear classifiers trained using gradient descent on unregularized logistic loss in this heavy-tailed setting. Our results show that, under certain conditions on the dimensionality $p$ and the distance between the centers of the distributions, the misclassification error of the maximum margin classifier asymptotically approaches the noise level, the theoretical optimal value. Moreover, we derive an upper bound on the learning rate $β$ for benign overfitting to occur and show that as the tail heaviness of the input distribution $α$ increases, the upper bound on the learning rate decreases. These results demonstrate that benign overfitting persists even in settings with heavier-tailed inputs than previously studied, contributing to a deeper understanding of the phenomenon in more realistic data environments.

Benign Overfitting under Learning Rate Conditions for $α$ Sub-exponential Input

TL;DR

It is shown that under certain conditions on the dimensionality and the distance between the centers of the distributions, the misclassification error of the maximum margin classifier asymptotically approaches the noise level, the theoretical optimal value.

Abstract

This paper investigates the phenomenon of benign overfitting in binary classification problems with heavy-tailed input distributions, extending the analysis of maximum margin classifiers to sub-exponential distributions (). This generalizes previous work focused on sub-gaussian inputs. We provide generalization error bounds for linear classifiers trained using gradient descent on unregularized logistic loss in this heavy-tailed setting. Our results show that, under certain conditions on the dimensionality and the distance between the centers of the distributions, the misclassification error of the maximum margin classifier asymptotically approaches the noise level, the theoretical optimal value. Moreover, we derive an upper bound on the learning rate for benign overfitting to occur and show that as the tail heaviness of the input distribution increases, the upper bound on the learning rate decreases. These results demonstrate that benign overfitting persists even in settings with heavier-tailed inputs than previously studied, contributing to a deeper understanding of the phenomenon in more realistic data environments.
Paper Structure (40 sections, 28 theorems, 118 equations, 6 figures, 3 tables)

This paper contains 40 sections, 28 theorems, 118 equations, 6 figures, 3 tables.

Key Result

Theorem 4

For any $\alpha \in (0,2]$ and $\kappa\in(0,1)$, there exists a constant $c>0$ such that, under assumptions (A1)-(A5), for all large enough $C$, with probability at least $1-\delta$, the maximum margin classifier $w$ satisfies

Figures (6)

  • Figure 1: Boxplot of estimated tail index $\xi$ for feature vector components extracted from the intermediate layers of a CNN with ReLU activation, trained on various datasets (CIFAR-10 krizhevsky2009learning, CIFAR-100 krizhevsky2009learning, Fashion-MNIST xiao2017fashion, SVHN netzer2011reading). The tail index $\xi$ represents the heaviness of the distribution tails, with smaller values indicating heavier tails. The Gaussian and Exponential distributions are included for comparison purposes and were not passed through the CNN. The results indicate that the feature vectors for certain datasets, have heavier-tailed distributions than the Gaussian distribution. Further details are found in Appendix \ref{['appendix:intro_xi']}.
  • Figure 2: Training and test errors versus dimension $p$ for a maximum margin classifier. $n_\mathrm{train}=200$, $n_\mathrm{test}=1000$, $p$ ranges from 100 to 1500. Data is generated from a heavy-tailed setting using a generalized normal distribution, as detailed in Section \ref{['section:heavy_tailed_setting']} and Appendix \ref{['appendix:intro_benign']}. The shape parameters are $\gamma = 0.25, 0.5, 2$, with variance normalized to 1. Noise level $\eta$ is $0.05$ (dotted line). Solid and dashed lines show training and test errors, respectively, with $95$% confidence intervals as error bars over 50 trials. Training error remains near zero, while test error stabilizes around the noise level as $p$ increases.
  • Figure 3: A heatmap showing the mean test error for $\beta=0.001$ with the horizontal axis representing the dimension $p$ and the vertical axis representing the shape parameter $\gamma$.
  • Figure 4: A heatmap showing the mean test error for $\gamma=0.8$ with the horizontal axis representing the dimension $p$ and the vertical axis representing the learning rate $\beta$.
  • Figure 5: A heatmap showing the mean test error for $p=4000$ with the horizontal axis representing the shape parameter $\gamma$ and the vertical axis representing the learning rate $\beta$.
  • ...and 1 more figures

Theorems & Definitions (45)

  • Definition 1: $\alpha$ sub-exponential random variable, sambale2023some
  • Example 2: Generalized normal distribution
  • Example 3: Generalized noisy rare-weak model
  • Theorem 4
  • Corollary 5
  • Proposition 6: A bound of the singular values of $X$
  • Corollary 7
  • Corollary 8
  • Corollary 9
  • Definition 10
  • ...and 35 more