Table of Contents
Fetching ...

TransformMix: Learning Transformation and Mixing Strategies from Data

Tsz-Him Cheung, Dit-Yan Yeung

TL;DR

An automated approach to learn better transformation and mixing augmentation strategies from data, TransformMix, which applies learned transformations and mixing masks to create compelling mixed images that contain correct and important information for the target tasks.

Abstract

Data augmentation improves the generalization power of deep learning models by synthesizing more training samples. Sample-mixing is a popular data augmentation approach that creates additional data by combining existing samples. Recent sample-mixing methods, like Mixup and Cutmix, adopt simple mixing operations to blend multiple inputs. Although such a heuristic approach shows certain performance gains in some computer vision tasks, it mixes the images blindly and does not adapt to different datasets automatically. A mixing strategy that is effective for a particular dataset does not often generalize well to other datasets. If not properly configured, the methods may create misleading mixed images, which jeopardize the effectiveness of sample-mixing augmentations. In this work, we propose an automated approach, TransformMix, to learn better transformation and mixing augmentation strategies from data. In particular, TransformMix applies learned transformations and mixing masks to create compelling mixed images that contain correct and important information for the target tasks. We demonstrate the effectiveness of TransformMix on multiple datasets in transfer learning, classification, object detection, and knowledge distillation settings. Experimental results show that our method achieves better performance as well as efficiency when compared with strong sample-mixing baselines.

TransformMix: Learning Transformation and Mixing Strategies from Data

TL;DR

An automated approach to learn better transformation and mixing augmentation strategies from data, TransformMix, which applies learned transformations and mixing masks to create compelling mixed images that contain correct and important information for the target tasks.

Abstract

Data augmentation improves the generalization power of deep learning models by synthesizing more training samples. Sample-mixing is a popular data augmentation approach that creates additional data by combining existing samples. Recent sample-mixing methods, like Mixup and Cutmix, adopt simple mixing operations to blend multiple inputs. Although such a heuristic approach shows certain performance gains in some computer vision tasks, it mixes the images blindly and does not adapt to different datasets automatically. A mixing strategy that is effective for a particular dataset does not often generalize well to other datasets. If not properly configured, the methods may create misleading mixed images, which jeopardize the effectiveness of sample-mixing augmentations. In this work, we propose an automated approach, TransformMix, to learn better transformation and mixing augmentation strategies from data. In particular, TransformMix applies learned transformations and mixing masks to create compelling mixed images that contain correct and important information for the target tasks. We demonstrate the effectiveness of TransformMix on multiple datasets in transfer learning, classification, object detection, and knowledge distillation settings. Experimental results show that our method achieves better performance as well as efficiency when compared with strong sample-mixing baselines.
Paper Structure (22 sections, 4 equations, 9 figures, 11 tables, 1 algorithm)

This paper contains 22 sections, 4 equations, 9 figures, 11 tables, 1 algorithm.

Figures (9)

  • Figure 1: Visual comparison of different sample-mixing methods on a dog and bear image. Our method better preserves the important region of the input images.
  • Figure 2: Illustrations of the intermediate results during mixing. From left to right, the columns show the visualizations of the input images, CAMs, transformed images and predicted masks.
  • Figure 3: Overview of TransformMix. The black arrows indicate the forward pass; the red arrows indicate the gradient flow when training the spatial transformation network $f_s$ and mask prediction network $f_m$; the blue arrow indicates the gradient flow when training the task network $g$. Step 1&2: the CAMs of the input images $(x_i, x_j)$ are extracted from the pre-trained teacher $f_t$. Step 3: the CAMs, input images and sampled mixing coefficient $\lambda$ are supplied to the mixing module to compute the transformations $(\phi_i, \phi_j)$ and mixing masks $(m_i, m_j)$ to create the mixed image $x'$, which is used in step 4a: training the mixing module, or step 4b: training the the task network.
  • Figure 4: Illustrations of the effect when using different temperature values $\tau$. The third image is the mixed result with learned $\tau=0.08$.
  • Figure 5: Illustration of the mixed outputs with increasing value of the mixing coefficient $\lambda$.
  • ...and 4 more figures