Table of Contents
Fetching ...

Domain Generalization by Rejecting Extreme Augmentations

Masih Aminbeidokhti, Fidel A. Guerrero Peña, Heitor Rapela Medeiros, Thomas Dubail, Eric Granger, Marco Pedersoli

TL;DR

Domain generalization is improved by a simple augmentation strategy that expands the transformation space and per-sample selects between weak and wider transforms using a reward that blends diversity and semantic consistency. The reward $R(\tilde{x}, z) = (1-\lambda) R_{div}(\tilde{x}, z) - \lambda R_{con}(\tilde{x}, z)$ leverages an EMA teacher to enforce consistency while encouraging varied inputs, and training alternates between reward computation and gradient updates. Evaluations on five DG benchmarks (PACS, VLCS, OfficeHome, TerraIncognita, DomainNet) show competitive to state-of-the-art performance, with TeachDCAug$_{label}$ often achieving the best averages and good robustness when using ViT backbones. Overall, the work demonstrates that carefully controlled aggressive augmentation, filtered by a per-sample reward, is a practical and effective tool for domain generalization across diverse visual domains.

Abstract

Data augmentation is one of the most effective techniques for regularizing deep learning models and improving their recognition performance in a variety of tasks and domains. However, this holds for standard in-domain settings, in which the training and test data follow the same distribution. For the out-of-domain case, where the test data follow a different and unknown distribution, the best recipe for data augmentation is unclear. In this paper, we show that for out-of-domain and domain generalization settings, data augmentation can provide a conspicuous and robust improvement in performance. To do that, we propose a simple training procedure: (i) use uniform sampling on standard data augmentation transformations; (ii) increase the strength transformations to account for the higher data variance expected when working out-of-domain, and (iii) devise a new reward function to reject extreme transformations that can harm the training. With this procedure, our data augmentation scheme achieves a level of accuracy that is comparable to or better than state-of-the-art methods on benchmark domain generalization datasets. Code: https://github.com/Masseeh/DCAug

Domain Generalization by Rejecting Extreme Augmentations

TL;DR

Domain generalization is improved by a simple augmentation strategy that expands the transformation space and per-sample selects between weak and wider transforms using a reward that blends diversity and semantic consistency. The reward leverages an EMA teacher to enforce consistency while encouraging varied inputs, and training alternates between reward computation and gradient updates. Evaluations on five DG benchmarks (PACS, VLCS, OfficeHome, TerraIncognita, DomainNet) show competitive to state-of-the-art performance, with TeachDCAug often achieving the best averages and good robustness when using ViT backbones. Overall, the work demonstrates that carefully controlled aggressive augmentation, filtered by a per-sample reward, is a practical and effective tool for domain generalization across diverse visual domains.

Abstract

Data augmentation is one of the most effective techniques for regularizing deep learning models and improving their recognition performance in a variety of tasks and domains. However, this holds for standard in-domain settings, in which the training and test data follow the same distribution. For the out-of-domain case, where the test data follow a different and unknown distribution, the best recipe for data augmentation is unclear. In this paper, we show that for out-of-domain and domain generalization settings, data augmentation can provide a conspicuous and robust improvement in performance. To do that, we propose a simple training procedure: (i) use uniform sampling on standard data augmentation transformations; (ii) increase the strength transformations to account for the higher data variance expected when working out-of-domain, and (iii) devise a new reward function to reject extreme transformations that can harm the training. With this procedure, our data augmentation scheme achieves a level of accuracy that is comparable to or better than state-of-the-art methods on benchmark domain generalization datasets. Code: https://github.com/Masseeh/DCAug
Paper Structure (21 sections, 8 equations, 10 figures, 13 tables, 1 algorithm)

This paper contains 21 sections, 8 equations, 10 figures, 13 tables, 1 algorithm.

Figures (10)

  • Figure 1: A conceptual illustration of our method. The inner circle and outer circle represent the space of weak (safe) and wider (possibly harmful) augmentations, respectively. Our method is able to automatically select for each combination of data samples and augmentation a wider transformation (when safe) or reject it when unsafe. This is achieved with the help of a reward function (represented as the yellow color gradient) that compares the diversity and the consistency of an augmented sample (see Section \ref{['sec:4']} for more details). In the illustration, given an image $x$, we present two possible paths of augmentation. For the blue path, the wide augmentation has a high diversity and high consistency, and therefore it is selected (green box). For the purple path, although the wide augmentation has high diversity, it also has low consistency, therefore the transformation is rejected (red box), and the weak transformation is used instead as augmentation.
  • Figure 2: Sample transformations from TA with wide and wider search space on PACS dataset. For each transformation, the first row shows the range of transformed samples with wide search space and the second row with wider. We see that the wider space can lead to more variety but also extreme and detrimental transformations that do not keep the semantics of the image. This motivates us to use wider transformations but find a way to reject the extreme ones.
  • Figure 3: Overview of DCAug$^{domain}$ procedure for rejecting extreme augmentations. After calculating $R_{div}$ and $R_{con}$ for $\mathcal{T}_{weak}$ and $\mathcal{T}_{wider}$ our method selects the transformation with the highest reward (green box) and updates the label classifier $f_{\theta}$ and domain student $h_{\phi}$ using the transformed input $\tilde{x}$. DCAug$^{label}$ and TeachDCAug$^{label}$ also follow the same procedure by replacing $d$ and $h_{\phi}$ by $y$ and $f_{\theta}$ respectively (see \ref{['fig:visual']} from supplementary materials for more visual changes of the selected images).
  • Figure 4: Evolution over epochs of Weak vs Wider augmentations on the four domains of the PACS dataset. The title of each plot shows the domain, out-of-domain accuracies, and the average ratio of each augmentation for the entire training run. In the plots, we clearly see that for Cartoon and Sketch, the two domains that are farther from the pre-trained model on ImageNet and with lower performance, strong transformations are preferred over weak ones.
  • Figure 5: Consistency and diversity for different methods for in-domain (left) and out-of-domain (right) settings on the PACS dataset. Color represents the classification accuracy on the test set. For high accuracy, we need a good trade-off between diversity and consistency.
  • ...and 5 more figures