Table of Contents
Fetching ...

Out-of-Domain Robustness via Targeted Augmentations

Irena Gao, Shiori Sagawa, Pang Wei Koh, Tatsunori Hashimoto, Percy Liang

TL;DR

This work tackles out-of-domain generalization by designing targeted data augmentations that randomize spurious domain-dependent features while preserving robust, domain-dependent cues. Grounded in a linear-regression analysis, the authors show that underspecified problems (few training domains relative to domain features) suffer high OOD risk with unaugmented or generic/domain-invariant methods, whereas targeting augments reduces effective dimensionality to the robust subspace and yields favorable OOD bounds that scale with the number of domains. Empirically, targeted augmentations achieve state-of-the-art OOD performance on real-world datasets (iWildCam, Camelyon17-WILDS, BirdCalls) and outperform baselines across multiple modalities, with ablations highlighting the importance of preserving robust domain features. The results demonstrate a principled approach to OOD robustness that leverages domain knowledge to selectively manipulate features, offering practical gains for deployment in new environments and camera/microphone settings.

Abstract

Models trained on one set of domains often suffer performance drops on unseen domains, e.g., when wildlife monitoring models are deployed in new camera locations. In this work, we study principles for designing data augmentations for out-of-domain (OOD) generalization. In particular, we focus on real-world scenarios in which some domain-dependent features are robust, i.e., some features that vary across domains are predictive OOD. For example, in the wildlife monitoring application above, image backgrounds vary across camera locations but indicate habitat type, which helps predict the species of photographed animals. Motivated by theoretical analysis on a linear setting, we propose targeted augmentations, which selectively randomize spurious domain-dependent features while preserving robust ones. We prove that targeted augmentations improve OOD performance, allowing models to generalize better with fewer domains. In contrast, existing approaches such as generic augmentations, which fail to randomize domain-dependent features, and domain-invariant augmentations, which randomize all domain-dependent features, both perform poorly OOD. In experiments on three real-world datasets, we show that targeted augmentations set new states-of-the-art for OOD performance by 3.2-15.2 percentage points.

Out-of-Domain Robustness via Targeted Augmentations

TL;DR

This work tackles out-of-domain generalization by designing targeted data augmentations that randomize spurious domain-dependent features while preserving robust, domain-dependent cues. Grounded in a linear-regression analysis, the authors show that underspecified problems (few training domains relative to domain features) suffer high OOD risk with unaugmented or generic/domain-invariant methods, whereas targeting augments reduces effective dimensionality to the robust subspace and yields favorable OOD bounds that scale with the number of domains. Empirically, targeted augmentations achieve state-of-the-art OOD performance on real-world datasets (iWildCam, Camelyon17-WILDS, BirdCalls) and outperform baselines across multiple modalities, with ablations highlighting the importance of preserving robust domain features. The results demonstrate a principled approach to OOD robustness that leverages domain knowledge to selectively manipulate features, offering practical gains for deployment in new environments and camera/microphone settings.

Abstract

Models trained on one set of domains often suffer performance drops on unseen domains, e.g., when wildlife monitoring models are deployed in new camera locations. In this work, we study principles for designing data augmentations for out-of-domain (OOD) generalization. In particular, we focus on real-world scenarios in which some domain-dependent features are robust, i.e., some features that vary across domains are predictive OOD. For example, in the wildlife monitoring application above, image backgrounds vary across camera locations but indicate habitat type, which helps predict the species of photographed animals. Motivated by theoretical analysis on a linear setting, we propose targeted augmentations, which selectively randomize spurious domain-dependent features while preserving robust ones. We prove that targeted augmentations improve OOD performance, allowing models to generalize better with fewer domains. In contrast, existing approaches such as generic augmentations, which fail to randomize domain-dependent features, and domain-invariant augmentations, which randomize all domain-dependent features, both perform poorly OOD. In experiments on three real-world datasets, we show that targeted augmentations set new states-of-the-art for OOD performance by 3.2-15.2 percentage points.
Paper Structure (36 sections, 23 theorems, 92 equations, 11 figures, 12 tables, 2 algorithms)

This paper contains 36 sections, 23 theorems, 92 equations, 11 figures, 12 tables, 2 algorithms.

Key Result

Theorem 1

If $D<p_\mathsf{dom}$, the expected excess OOD risk of the unaugmented model is bounded below as

Figures (11)

  • Figure 1: We model inputs as $x = f(x_\mathsf{obj}, x_\mathsf{d:robust}, x_\mathsf{d:spu}, x_\mathsf{noise})$, where each of the four types of features are either (i) dependent on the domain $d$ or not and (ii) dependent on the output label $y$ or not, both in the population $P$. We study targeted augmentations, which randomize $x_\mathsf{d:spu}$ but preserve $x_\mathsf{d:robust}$, and we consider three real-world datasets beery2021iwildcambandi2018detectionkoh2021wilds, each of which have both robust and spurious domain-dependent features.
  • Figure 2: Augmentation examples for the three real-world datasets, including targeted augmentations Copy-Paste (Same Y) for iWildCam, Stain Color Jitter for Camelyon17, and Copy-Paste + Jitter (Region) for BirdCalls. Targeted augmentations randomize $x_\mathsf{d:spu}$ but preserve $x_\mathsf{d:robust}$. In Section \ref{['sec:ablations']}, we compare to modified Copy-Paste augmentations in the ablation column.
  • Figure 3: Targeted augmentations (red line) improve OOD error substantially, while generic (orange) or unaugmented (blue) models require many training domains to attain low OOD error. Domain-invariant augmentations (green line) have constant high error. We plot OOD RMSE for varying number of training domains, with standard errors over 10 random seeds. We also plot the risk bounds from Section \ref{['sec:theory']} for the high-sample regime; because the bounds assume infinite data, we do not plot them for the low-sample case. The plotted Theorem \ref{['thm:ood-bound-tgt-zhu']} bound is a more general version (Appendix \ref{['sec:app:proof-ood-bound-tgt-zhu']}).
  • Figure 4: We plot the in-domain (ID) performance of methods against their out-of-domain (OOD) performance. Error bars are standard errors over replicates. Targeted augmentations significantly improve OOD performance over the nearest baseline, improving OOD Macro F1 on iWildCam from 33.3% $\to$ 36.5%, OOD average accuracy on Camelyon17 from 75.3% $\to$ 90.5%, and OOD Macro F1 on BirdCalls from 31.8% $\to$ 37.8%. Tables and additional details can be found in Appendix \ref{['sec:app:experiment']}.
  • Figure 5: Hospitals vary in the distribution of cancer stages they observe in patients, due to the different patient distributions they service. This in turn affects the causal feature for cancer prediction (cell morphology).
  • ...and 6 more figures

Theorems & Definitions (44)

  • Theorem 1: Excess OOD risk without augmentations
  • proof : Proof sketch.
  • Theorem 2: Excess OOD risk with targeted augmentations
  • proof : Proof sketch.
  • Theorem 3: Targeted augmentations improve OOD risk
  • Theorem 4: OOD error with domain-invariant augmentations
  • Proposition 1: Estimator without augmentation
  • proof
  • Proposition 2: Estimator with generic augmentation
  • proof
  • ...and 34 more