D-GAP: Improving Out-of-Domain Robustness via Dataset-Agnostic and Gradient-Guided Augmentation in Amplitude and Pixel Spaces
Ruoqi Wang, Haitao Wang, Shaojie Guo, Qiong Luo
TL;DR
The paper tackles out-of-domain robustness under real-world domain shifts by proposing D-GAP, a dataset-agnostic augmentation that operates in both the frequency (amplitude) and pixel spaces. It uses gradient-guided amplitude interpolation with a sensitivity map $G(u,v)$ to derive a per-frequency mixing map $D(u,v)$, ensuring more biased frequencies are nudged toward the target domain, while a complementary pixel-space blending preserves spatial detail. Key contributions include the gradient-guided dual-space augmentation, a connectivity-based perspective to evaluate cross-domain feature changes, and state-of-the-art OOD results on four real-world datasets and three standard benchmarks. The method yields average OOD gains of about $+5.3\%$ on real-world data and $+1.8\%$ on benchmarks, showing strong generalization without dataset-specific tuning; limitations include additional gradient computation during training, with future work aimed at improving efficiency and integration with foundation models or self-supervised objectives.
Abstract
Out-of-domain (OOD) robustness is challenging to achieve in real-world computer vision applications, where shifts in image background, style, and acquisition instruments always degrade model performance. Generic augmentations show inconsistent gains under such shifts, whereas dataset-specific augmentations require expert knowledge and prior analysis. Moreover, prior studies show that neural networks adapt poorly to domain shifts because they exhibit a learning bias to domain-specific frequency components. Perturbing frequency values can mitigate such bias but overlooks pixel-level details, leading to suboptimal performance. To address these problems, we propose D-GAP (Dataset-agnostic and Gradient-guided augmentation in Amplitude and Pixel spaces), improving OOD robustness by introducing targeted augmentation in both the amplitude space (frequency space) and pixel space. Unlike conventional handcrafted augmentations, D-GAP computes sensitivity maps in the frequency space from task gradients, which reflect how strongly the model responds to different frequency components, and uses the maps to adaptively interpolate amplitudes between source and target samples. This way, D-GAP reduces the learning bias in frequency space, while a complementary pixel-space blending procedure restores fine spatial details. Extensive experiments on four real-world datasets and three domain-adaptation benchmarks show that D-GAP consistently outperforms both generic and dataset-specific augmentations, improving average OOD performance by +5.3% on real-world datasets and +1.8% on benchmark datasets.
