Table of Contents
Fetching ...

Regularizing Neural Network Training via Identity-wise Discriminative Feature Suppression

Avraham Chapman, Lingqiao Liu

TL;DR

ASIF addresses overfitting in deep networks by suppressing identity-specific features while preserving class-discriminative cues. It introduces an adversarial framework with a per-class Identifier head and a Dynamic Gradient Reversal to balance training without manual tuning, promoting class-wise feature learning. Empirical results on CIFAR10 and Fashion-MNIST show improved generalization in small-data regimes and robustness to noisy labels, along with the ability to flag likely incorrect labels. The approach offers a path toward more robust and potentially domain-invariant representations, with future work aimed at unsupervised identification and broader domain transfer scenarios.

Abstract

It is well-known that a deep neural network has a strong fitting capability and can easily achieve a low training error even with randomly assigned class labels. When the number of training samples is small, or the class labels are noisy, networks tend to memorize patterns specific to individual instances to minimize the training error. This leads to the issue of overfitting and poor generalisation performance. This paper explores a remedy by suppressing the network's tendency to rely on instance-specific patterns for empirical error minimisation. The proposed method is based on an adversarial training framework. It suppresses features that can be utilized to identify individual instances among samples within each class. This leads to classifiers only using features that are both discriminative across classes and common within each class. We call our method Adversarial Suppression of Identity Features (ASIF), and demonstrate the usefulness of this technique in boosting generalisation accuracy when faced with small datasets or noisy labels. Our source code is available.

Regularizing Neural Network Training via Identity-wise Discriminative Feature Suppression

TL;DR

ASIF addresses overfitting in deep networks by suppressing identity-specific features while preserving class-discriminative cues. It introduces an adversarial framework with a per-class Identifier head and a Dynamic Gradient Reversal to balance training without manual tuning, promoting class-wise feature learning. Empirical results on CIFAR10 and Fashion-MNIST show improved generalization in small-data regimes and robustness to noisy labels, along with the ability to flag likely incorrect labels. The approach offers a path toward more robust and potentially domain-invariant representations, with future work aimed at unsupervised identification and broader domain transfer scenarios.

Abstract

It is well-known that a deep neural network has a strong fitting capability and can easily achieve a low training error even with randomly assigned class labels. When the number of training samples is small, or the class labels are noisy, networks tend to memorize patterns specific to individual instances to minimize the training error. This leads to the issue of overfitting and poor generalisation performance. This paper explores a remedy by suppressing the network's tendency to rely on instance-specific patterns for empirical error minimisation. The proposed method is based on an adversarial training framework. It suppresses features that can be utilized to identify individual instances among samples within each class. This leads to classifiers only using features that are both discriminative across classes and common within each class. We call our method Adversarial Suppression of Identity Features (ASIF), and demonstrate the usefulness of this technique in boosting generalisation accuracy when faced with small datasets or noisy labels. Our source code is available.
Paper Structure (21 sections, 2 equations, 4 figures, 11 tables, 1 algorithm)

This paper contains 21 sections, 2 equations, 4 figures, 11 tables, 1 algorithm.

Figures (4)

  • Figure 1: Architecture of our proposed ASIF network. (a) A shared Feature Extractor $\Psi$ extracts features for use by all downstream tasks. A Classifier module $h_C$ performs the Classification task, while an Identifier module $h_I$ performs the Identification task. (b) The Identifier module contains shared parameters that are trained on all samples, as well as dedicated parameters for each class of samples producing $C$ outputs, one for each class. There is a Dynamic Gradient Reversal (DGR) layer between the shared and per-class parameters.
  • Figure 2: Classification loss, training and test accuracy versus training epoch when training with Symmetrical 80% noise and CIFAR10. The use of Dynamic Gradient Reversal (DGR) leads to reduced overfitting, as indicated by the classification loss not dropping as fast. Also note that the DGR training and test accuracies remain in lockstep, while the DANN accuracies diverge during later training.
  • Figure 3: CIFAR10 results. ASIF confers a clear accuracy improvement over the baseline training method (Cross Entropy) with noisy labels, especially in the high-noise end of the range. GCE and PHuber training results are included for comparison.
  • Figure 4: The accuracy obtained when training a single-layer classifier on the output of feature extractors trained with Cross Entropy Loss (Red) and ASIF (Blue) using between 1 and 100 of the most important features.