Table of Contents
Fetching ...

OUI Need to Talk About Weight Decay: A New Perspective on Overfitting Detection

Alberto Fernández-Hernández, Jose I. Mestre, Manuel F. Dolz, Jose Duato, Enrique S. Quintana-Ortí

TL;DR

The paper introduces the Overfitting-Underfitting Indicator (OUI), a activation-pattern–based metric that assesses a DNN’s expressive power and training dynamics without validation data. By comparing per-sample activation patterns across layers via normalized Hamming distance and aggregating these into OUI, the authors connect intermediate activation variability with generalization behavior. They demonstrate that maintaining OUI in an intermediate range (approximately [0.6,0.8]) during early training aligns with the best validation performance and that OUI converges faster than traditional metrics, enabling efficient WD tuning. Across three CNN experiments (DenseNet-BC-100 on CIFAR-100, EfficientNet-B0 on TinyImageNet, ResNet-34 on ImageNet-1K), they show WD values producing intermediate OUI yield superior generalization, with practical implications for automatic WD adjustment and broader applicability to other architectures.

Abstract

We introduce the Overfitting-Underfitting Indicator (OUI), a novel tool for monitoring the training dynamics of Deep Neural Networks (DNNs) and identifying optimal regularization hyperparameters. Specifically, we validate that OUI can effectively guide the selection of the Weight Decay (WD) hyperparameter by indicating whether a model is overfitting or underfitting during training without requiring validation data. Through experiments on DenseNet-BC-100 with CIFAR- 100, EfficientNet-B0 with TinyImageNet and ResNet-34 with ImageNet-1K, we show that maintaining OUI within a prescribed interval correlates strongly with improved generalization and validation scores. Notably, OUI converges significantly faster than traditional metrics such as loss or accuracy, enabling practitioners to identify optimal WD (hyperparameter) values within the early stages of training. By leveraging OUI as a reliable indicator, we can determine early in training whether the chosen WD value leads the model to underfit the training data, overfit, or strike a well-balanced trade-off that maximizes validation scores. This enables more precise WD tuning for optimal performance on the tested datasets and DNNs. All code for reproducing these experiments is available at https://github.com/AlbertoFdezHdez/OUI.

OUI Need to Talk About Weight Decay: A New Perspective on Overfitting Detection

TL;DR

The paper introduces the Overfitting-Underfitting Indicator (OUI), a activation-pattern–based metric that assesses a DNN’s expressive power and training dynamics without validation data. By comparing per-sample activation patterns across layers via normalized Hamming distance and aggregating these into OUI, the authors connect intermediate activation variability with generalization behavior. They demonstrate that maintaining OUI in an intermediate range (approximately [0.6,0.8]) during early training aligns with the best validation performance and that OUI converges faster than traditional metrics, enabling efficient WD tuning. Across three CNN experiments (DenseNet-BC-100 on CIFAR-100, EfficientNet-B0 on TinyImageNet, ResNet-34 on ImageNet-1K), they show WD values producing intermediate OUI yield superior generalization, with practical implications for automatic WD adjustment and broader applicability to other architectures.

Abstract

We introduce the Overfitting-Underfitting Indicator (OUI), a novel tool for monitoring the training dynamics of Deep Neural Networks (DNNs) and identifying optimal regularization hyperparameters. Specifically, we validate that OUI can effectively guide the selection of the Weight Decay (WD) hyperparameter by indicating whether a model is overfitting or underfitting during training without requiring validation data. Through experiments on DenseNet-BC-100 with CIFAR- 100, EfficientNet-B0 with TinyImageNet and ResNet-34 with ImageNet-1K, we show that maintaining OUI within a prescribed interval correlates strongly with improved generalization and validation scores. Notably, OUI converges significantly faster than traditional metrics such as loss or accuracy, enabling practitioners to identify optimal WD (hyperparameter) values within the early stages of training. By leveraging OUI as a reliable indicator, we can determine early in training whether the chosen WD value leads the model to underfit the training data, overfit, or strike a well-balanced trade-off that maximizes validation scores. This enables more precise WD tuning for optimal performance on the tested datasets and DNNs. All code for reproducing these experiments is available at https://github.com/AlbertoFdezHdez/OUI.

Paper Structure

This paper contains 12 sections, 1 theorem, 7 equations, 3 figures.

Key Result

Proposition 1

A dnn with relu activations satisfies:

Figures (3)

  • Figure 1: Simple dnn to illustrate the concept of an activation pattern, capturing which neurons are active (green) or inactive (grey) for a specific input sample along the hidden layers of the dnn. For the illustrated input $x$, the activation pattern for the two hidden layers are $P_1(x) = [1, 0, 1, 1]$ and $P_2(x) = [1, 1, 0, 1, 0]$.
  • Figure 2: Comparison of oui evolution and training dynamics across different architectures and datasets. Each column corresponds to an experiment: DenseNet-BC-100 on CIFAR-100 (left), EfficientNet-B0 on TinyImageNet (center), and ResNet-34 on ImageNet-1K (right). Each row represents a specific analysis: the first row shows oui trajectories for multiple wd values, while the second, third, and fourth rows correspond to training and validation loss dynamics for low, intermediate, and high wd values, respectively.
  • Figure 3: mva and final oui as functions of wd for DenseNet-BC-100, EfficientNet-B0 and ResNet-34.

Theorems & Definitions (6)

  • Definition 1
  • Definition 2
  • Proposition 1
  • Remark 1
  • proof : Proof of Proposition \ref{['prop']}
  • Remark 2