Table of Contents
Fetching ...

Generalization Gap in Data Augmentation: Insights from Illumination

Jianqiang Xiao, Weiwen Guo, Junfeng Liu, Mengze Li

TL;DR

The study investigates how illumination, as a controllable visual representation variable, affects generalization when training data are augmented. By constructing a Full Spectrum Illumination Dataset (FSID) and a Singular Illumination Dataset (SID), the authors demonstrate a large generalization gap when training on SID. They propose illumination vector mapping augmentation (IVAD) and Bayesian optimization-based color augmentation (BO-DA) to mitigate this gap, with IVAD and BO-DA improving performance but not matching FSID on real-world illumination. The results highlight the limits of augmentation alone and emphasize the need for diverse, realistic visual features in training data to achieve robust generalization.

Abstract

In the field of computer vision, data augmentation is widely used to enrich the feature complexity of training datasets with deep learning techniques. However, regarding the generalization capabilities of models, the difference in artificial features generated by data augmentation and natural visual features has not been fully revealed. This study introduces the concept of "visual representation variables" to define the possible visual variations in a task as a joint distribution of these variables. We focus on the visual representation variable "illumination", by simulating its distribution degradation and examining how data augmentation techniques enhance model performance on a classification task. Our goal is to investigate the differences in generalization between models trained with augmented data and those trained under real-world illumination conditions. Results indicate that after applying various data augmentation methods, model performance has significantly improved. Yet, a noticeable generalization gap still exists after utilizing various data augmentation methods, emphasizing the critical role of feature diversity in the training set for enhancing model generalization.

Generalization Gap in Data Augmentation: Insights from Illumination

TL;DR

The study investigates how illumination, as a controllable visual representation variable, affects generalization when training data are augmented. By constructing a Full Spectrum Illumination Dataset (FSID) and a Singular Illumination Dataset (SID), the authors demonstrate a large generalization gap when training on SID. They propose illumination vector mapping augmentation (IVAD) and Bayesian optimization-based color augmentation (BO-DA) to mitigate this gap, with IVAD and BO-DA improving performance but not matching FSID on real-world illumination. The results highlight the limits of augmentation alone and emphasize the need for diverse, realistic visual features in training data to achieve robust generalization.

Abstract

In the field of computer vision, data augmentation is widely used to enrich the feature complexity of training datasets with deep learning techniques. However, regarding the generalization capabilities of models, the difference in artificial features generated by data augmentation and natural visual features has not been fully revealed. This study introduces the concept of "visual representation variables" to define the possible visual variations in a task as a joint distribution of these variables. We focus on the visual representation variable "illumination", by simulating its distribution degradation and examining how data augmentation techniques enhance model performance on a classification task. Our goal is to investigate the differences in generalization between models trained with augmented data and those trained under real-world illumination conditions. Results indicate that after applying various data augmentation methods, model performance has significantly improved. Yet, a noticeable generalization gap still exists after utilizing various data augmentation methods, emphasizing the critical role of feature diversity in the training set for enhancing model generalization.
Paper Structure (19 sections, 6 figures, 4 tables)

This paper contains 19 sections, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Visual representation variables decomposition guided by task prior knowledge.
  • Figure 2: An assortment of 10 distinct toy dogs serves as recognition targets in our classification task. The variety in their visual features, such as shape, color, fur texture, and attire, highlights the complexity of our dataset and assesses the classification models' ability to distinguish visual differences from subtle to pronounced.
  • Figure 3: (a) A dual light source setup with supplementary lamps placed at 45-degree angles to ensure balanced illumination. (b) The light intensity meter for precise measurement of illumination conditions.
  • Figure 4: Toy 1 depicted under 15 illumination settings within the FSID, with light colors [Warm, Cool, Mixed] and intensities [-2, -1, 0, +1, +2]. The illumination intensity and color temperature are described in Tab. \ref{['tab1']}.
  • Figure 5: Establishing the illumination settings of FSID to generate extensive illumination vectors for augmenting the SID dataset. (a) 18% gray card, (b) scene assembled for data collection, and (c) images from the SID dataset of Toy 1, enhanced with illumination vectors under diverse illumination settings (detailed in Tab. \ref{['tab1']}).
  • ...and 1 more figures