Table of Contents
Fetching ...

Optimal Layer Selection for Latent Data Augmentation

Tomoumi Takase, Ryo Karakida

TL;DR

This work tackles the challenge of selecting latent-layer positions for data augmentation in neural networks. It introduces AdaLASE, a gradient-based method that assigns and dynamically updates per-layer acceptance ratios $q_i$ to determine where latent-DA should be applied, using a proxy loss $L_{DA}$ and a pseudo-validation objective to guide optimization. Across diverse datasets and models, AdaLASE achieves accuracy comparable to or better than Uniform DA, while revealing that optimal layers for augmentation depend on data regime and task. The proposed framework reduces heuristic layer selection and computational costs, with potential extensions to optimize augmentation types and multiple methods jointly, enhancing automated latent-DA policy search in transfer learning and beyond.

Abstract

While data augmentation (DA) is generally applied to input data, several studies have reported that applying DA to hidden layers in neural networks, i.e., feature augmentation, can improve performance. However, in previous studies, the layers to which DA is applied have not been carefully considered, often being applied randomly and uniformly or only to a specific layer, leaving room for arbitrariness. Thus, in this study, we investigated the trends of suitable layers for applying DA in various experimental configurations, e.g., training from scratch, transfer learning, various dataset settings, and different models. In addition, to adjust the suitable layers for DA automatically, we propose the adaptive layer selection (AdaLASE) method, which updates the ratio to perform DA for each layer based on the gradient descent method during training. The experimental results obtained on several image classification datasets indicate that the proposed AdaLASE method altered the ratio as expected and achieved high overall test accuracy.

Optimal Layer Selection for Latent Data Augmentation

TL;DR

This work tackles the challenge of selecting latent-layer positions for data augmentation in neural networks. It introduces AdaLASE, a gradient-based method that assigns and dynamically updates per-layer acceptance ratios to determine where latent-DA should be applied, using a proxy loss and a pseudo-validation objective to guide optimization. Across diverse datasets and models, AdaLASE achieves accuracy comparable to or better than Uniform DA, while revealing that optimal layers for augmentation depend on data regime and task. The proposed framework reduces heuristic layer selection and computational costs, with potential extensions to optimize augmentation types and multiple methods jointly, enhancing automated latent-DA policy search in transfer learning and beyond.

Abstract

While data augmentation (DA) is generally applied to input data, several studies have reported that applying DA to hidden layers in neural networks, i.e., feature augmentation, can improve performance. However, in previous studies, the layers to which DA is applied have not been carefully considered, often being applied randomly and uniformly or only to a specific layer, leaving room for arbitrariness. Thus, in this study, we investigated the trends of suitable layers for applying DA in various experimental configurations, e.g., training from scratch, transfer learning, various dataset settings, and different models. In addition, to adjust the suitable layers for DA automatically, we propose the adaptive layer selection (AdaLASE) method, which updates the ratio to perform DA for each layer based on the gradient descent method during training. The experimental results obtained on several image classification datasets indicate that the proposed AdaLASE method altered the ratio as expected and achieved high overall test accuracy.
Paper Structure (10 sections, 7 equations, 7 figures, 2 tables, 1 algorithm)

This paper contains 10 sections, 7 equations, 7 figures, 2 tables, 1 algorithm.

Figures (7)

  • Figure 1: Positions to apply DA in several neural networks.
  • Figure 2: Examples of latent-DA. The sample in Example 1 was selected from the COIL-20 dataset, and the sample in Example 2 was selected from the STL-10 dataset. These samples were augmented using cutout or translation at P0, P1, and P2 in ResNet18.
  • Figure 3: Transitions of the acceptance ratio for P0 in proposed AdaLASE method when training MLP on the CIFAR-10 dataset. The results of 20 runs with different initializations are shown. Test data were used rather than the pseudo-validation data in the AdaLASE method. The models were trained from scratch.
  • Figure 4: Difference in acceptance ratios when the lower limit was varied. MLP was trained on the CIFAR-10 dataset with cutout. Test data were used rather than the pseudo-validation dataset in the AdaLASE method. The models were trained from scratch.
  • Figure 5: Difference of numbers of the iterations that selected the worst layer between the AdaLASE method and the uniform ratio method. Test data were used rather than the pseudo-validation data in the AdaLASE method. The models were trained from scratch.
  • ...and 2 more figures