Out-of-Domain Robustness via Targeted Augmentations
Irena Gao, Shiori Sagawa, Pang Wei Koh, Tatsunori Hashimoto, Percy Liang
TL;DR
This work tackles out-of-domain generalization by designing targeted data augmentations that randomize spurious domain-dependent features while preserving robust, domain-dependent cues. Grounded in a linear-regression analysis, the authors show that underspecified problems (few training domains relative to domain features) suffer high OOD risk with unaugmented or generic/domain-invariant methods, whereas targeting augments reduces effective dimensionality to the robust subspace and yields favorable OOD bounds that scale with the number of domains. Empirically, targeted augmentations achieve state-of-the-art OOD performance on real-world datasets (iWildCam, Camelyon17-WILDS, BirdCalls) and outperform baselines across multiple modalities, with ablations highlighting the importance of preserving robust domain features. The results demonstrate a principled approach to OOD robustness that leverages domain knowledge to selectively manipulate features, offering practical gains for deployment in new environments and camera/microphone settings.
Abstract
Models trained on one set of domains often suffer performance drops on unseen domains, e.g., when wildlife monitoring models are deployed in new camera locations. In this work, we study principles for designing data augmentations for out-of-domain (OOD) generalization. In particular, we focus on real-world scenarios in which some domain-dependent features are robust, i.e., some features that vary across domains are predictive OOD. For example, in the wildlife monitoring application above, image backgrounds vary across camera locations but indicate habitat type, which helps predict the species of photographed animals. Motivated by theoretical analysis on a linear setting, we propose targeted augmentations, which selectively randomize spurious domain-dependent features while preserving robust ones. We prove that targeted augmentations improve OOD performance, allowing models to generalize better with fewer domains. In contrast, existing approaches such as generic augmentations, which fail to randomize domain-dependent features, and domain-invariant augmentations, which randomize all domain-dependent features, both perform poorly OOD. In experiments on three real-world datasets, we show that targeted augmentations set new states-of-the-art for OOD performance by 3.2-15.2 percentage points.
