Table of Contents
Fetching ...

Generative Hints

Andy Dimnaku, Abdullah Yusuf Kavranoğlu, Yaser Abu-Mostafa

TL;DR

Generative hints are proposed, a training methodology that directly enforces known invariances in the entire input space and consistently outperform standard data augmentation when learning the same property.

Abstract

Data augmentation is widely used in vision to introduce variation and mitigate overfitting, through enabling models to learn invariant properties, such as spatial invariance. However, these properties are not fully captured by data augmentation alone, since it attempts to learn the property on transformations of the training data only. We propose generative hints, a training methodology that directly enforces known invariances in the entire input space. Our approach leverages a generative model trained on the training set to approximate the input distribution and generate unlabeled images, which we refer to as virtual examples. These virtual examples are used to enforce functional properties known as hints. In generative hints, although the training dataset is fully labeled, the model is trained in a semi-supervised manner on both the classification and hint objectives, using the unlabeled virtual examples to guide the model in learning the desired hint. Across datasets, architectures, and loss functions, generative hints consistently outperform standard data augmentation when learning the same property. On popular fine-grained visual classification benchmarks, we achieved up to 1.78% top-1 accuracy improvement (0.63% on average) over fine-tuned models with data augmentation and an average performance boost of 1.286% on the CheXpert X-ray dataset.

Generative Hints

TL;DR

Generative hints are proposed, a training methodology that directly enforces known invariances in the entire input space and consistently outperform standard data augmentation when learning the same property.

Abstract

Data augmentation is widely used in vision to introduce variation and mitigate overfitting, through enabling models to learn invariant properties, such as spatial invariance. However, these properties are not fully captured by data augmentation alone, since it attempts to learn the property on transformations of the training data only. We propose generative hints, a training methodology that directly enforces known invariances in the entire input space. Our approach leverages a generative model trained on the training set to approximate the input distribution and generate unlabeled images, which we refer to as virtual examples. These virtual examples are used to enforce functional properties known as hints. In generative hints, although the training dataset is fully labeled, the model is trained in a semi-supervised manner on both the classification and hint objectives, using the unlabeled virtual examples to guide the model in learning the desired hint. Across datasets, architectures, and loss functions, generative hints consistently outperform standard data augmentation when learning the same property. On popular fine-grained visual classification benchmarks, we achieved up to 1.78% top-1 accuracy improvement (0.63% on average) over fine-tuned models with data augmentation and an average performance boost of 1.286% on the CheXpert X-ray dataset.

Paper Structure

This paper contains 23 sections, 4 equations, 2 figures, 7 tables, 1 algorithm.

Figures (2)

  • Figure 1: Depiction of virtual examples applied to each dataset. The datasets shown are Stanford Cars (top left), CUB-200-2011 Caltech Birds (top right), FGVC Aircraft (bottom left), and Oxford Flowers (bottom right). For each dataset, from left to right, we show an original training image, a virtual example sampled from the generative model, and the corresponding hint-transformed image.
  • Figure 2: Correlation between the generative hint loss on generated samples and the hint loss on real training data, plotted against the FID of the generative model. The horizontal dashed line indicates zero correlation. The vertical dashed line highlights the approximate FID threshold ($\sim 11$) where the generative model begins to provide meaningful learning signal. The red point marks the FID 5.58 where correlation reaches 0.91.

Theorems & Definitions (8)

  • Definition 1: Hint
  • Definition 2: Invariance Hint
  • Definition 3: Virtual Example
  • Definition 4: Invariance Hint on Virtual Examples
  • Definition 5: Flip Invariance Hint
  • Definition 6: Spatial Invariance Hint
  • Definition 7: Symmetric KL Hint Loss
  • Definition 8: MSE Hint Loss