Table of Contents
Fetching ...

ConsistentFeature: A Plug-and-Play Component for Neural Network Regularization

RuiZhe Jiang, Haotian Lei

TL;DR

The paper introduces ConsistentFeature (CF), a plug-and-play regularization technique that treats training data as multiple i.i.d. domains and enforces feature consistency across random splits via an auxiliary discriminator. By adversarially aligning the feature distributions across splits, CF promotes domain-invariant, generalizable representations and suppresses memorization, with minimal architectural assumptions and computational overhead. Empirical results across diverse datasets and architectures show CF reduces overfitting, lowers validation loss, and improves accuracy, including on out-of-distribution data like ImageNet-A, while remaining robust to hyperparameter choices. CF can be combined with traditional regularizers to further enhance generalization, making it a practical tool for improving normal convergence and memory suppression even when overfitting is not pronounced.

Abstract

Over-parameterized neural network models often lead to significant performance discrepancies between training and test sets, a phenomenon known as overfitting. To address this, researchers have proposed numerous regularization techniques tailored to various tasks and model architectures. In this paper, we introduce a simple perspective on overfitting: models learn different representations in different i.i.d. datasets. Based on this viewpoint, we propose an adaptive method, ConsistentFeature, that regularizes the model by constraining feature differences across random subsets of the same training set. Due to minimal prior assumptions, this approach is applicable to almost any architecture and task. Our experiments show that it effectively reduces overfitting, with low sensitivity to hyperparameters and minimal computational cost. It demonstrates particularly strong memory suppression and promotes normal convergence, even when the model has already started to overfit. Even in the absence of significant overfitting, our method consistently improves accuracy and reduces validation loss.

ConsistentFeature: A Plug-and-Play Component for Neural Network Regularization

TL;DR

The paper introduces ConsistentFeature (CF), a plug-and-play regularization technique that treats training data as multiple i.i.d. domains and enforces feature consistency across random splits via an auxiliary discriminator. By adversarially aligning the feature distributions across splits, CF promotes domain-invariant, generalizable representations and suppresses memorization, with minimal architectural assumptions and computational overhead. Empirical results across diverse datasets and architectures show CF reduces overfitting, lowers validation loss, and improves accuracy, including on out-of-distribution data like ImageNet-A, while remaining robust to hyperparameter choices. CF can be combined with traditional regularizers to further enhance generalization, making it a practical tool for improving normal convergence and memory suppression even when overfitting is not pronounced.

Abstract

Over-parameterized neural network models often lead to significant performance discrepancies between training and test sets, a phenomenon known as overfitting. To address this, researchers have proposed numerous regularization techniques tailored to various tasks and model architectures. In this paper, we introduce a simple perspective on overfitting: models learn different representations in different i.i.d. datasets. Based on this viewpoint, we propose an adaptive method, ConsistentFeature, that regularizes the model by constraining feature differences across random subsets of the same training set. Due to minimal prior assumptions, this approach is applicable to almost any architecture and task. Our experiments show that it effectively reduces overfitting, with low sensitivity to hyperparameters and minimal computational cost. It demonstrates particularly strong memory suppression and promotes normal convergence, even when the model has already started to overfit. Even in the absence of significant overfitting, our method consistently improves accuracy and reduces validation loss.

Paper Structure

This paper contains 17 sections, 7 equations, 11 figures, 5 tables.

Figures (11)

  • Figure 1: Validate Loss/Top-1 ACC with and without Consistent Feature (CF) of ShuffleNetV2 on ImageNet200
  • Figure 2: T-SNE Visualization of Semantically Similar Categories (a, b) / Semantically Unrelated Categories (c, d)
  • Figure 3: Illustration of the Proposed Method. By randomly splitting the data into two subsets (i.e., $D_A$, $D_B$), the discriminator attempts to distinguish the data labels based on the model's feature outputs. Meanwhile, the model adversarially interacts with the discriminator using a subset of samples, thereby reducing the discernibility of the feature set.
  • Figure 4: Validate Loss on CIFAR-100.
  • Figure 5: Validate Loss on Webvision-mini.
  • ...and 6 more figures

Theorems & Definitions (1)

  • Definition 1