Table of Contents
Fetching ...

D-GAP: Improving Out-of-Domain Robustness via Dataset-Agnostic and Gradient-Guided Augmentation in Amplitude and Pixel Spaces

Ruoqi Wang, Haitao Wang, Shaojie Guo, Qiong Luo

TL;DR

The paper tackles out-of-domain robustness under real-world domain shifts by proposing D-GAP, a dataset-agnostic augmentation that operates in both the frequency (amplitude) and pixel spaces. It uses gradient-guided amplitude interpolation with a sensitivity map $G(u,v)$ to derive a per-frequency mixing map $D(u,v)$, ensuring more biased frequencies are nudged toward the target domain, while a complementary pixel-space blending preserves spatial detail. Key contributions include the gradient-guided dual-space augmentation, a connectivity-based perspective to evaluate cross-domain feature changes, and state-of-the-art OOD results on four real-world datasets and three standard benchmarks. The method yields average OOD gains of about $+5.3\%$ on real-world data and $+1.8\%$ on benchmarks, showing strong generalization without dataset-specific tuning; limitations include additional gradient computation during training, with future work aimed at improving efficiency and integration with foundation models or self-supervised objectives.

Abstract

Out-of-domain (OOD) robustness is challenging to achieve in real-world computer vision applications, where shifts in image background, style, and acquisition instruments always degrade model performance. Generic augmentations show inconsistent gains under such shifts, whereas dataset-specific augmentations require expert knowledge and prior analysis. Moreover, prior studies show that neural networks adapt poorly to domain shifts because they exhibit a learning bias to domain-specific frequency components. Perturbing frequency values can mitigate such bias but overlooks pixel-level details, leading to suboptimal performance. To address these problems, we propose D-GAP (Dataset-agnostic and Gradient-guided augmentation in Amplitude and Pixel spaces), improving OOD robustness by introducing targeted augmentation in both the amplitude space (frequency space) and pixel space. Unlike conventional handcrafted augmentations, D-GAP computes sensitivity maps in the frequency space from task gradients, which reflect how strongly the model responds to different frequency components, and uses the maps to adaptively interpolate amplitudes between source and target samples. This way, D-GAP reduces the learning bias in frequency space, while a complementary pixel-space blending procedure restores fine spatial details. Extensive experiments on four real-world datasets and three domain-adaptation benchmarks show that D-GAP consistently outperforms both generic and dataset-specific augmentations, improving average OOD performance by +5.3% on real-world datasets and +1.8% on benchmark datasets.

D-GAP: Improving Out-of-Domain Robustness via Dataset-Agnostic and Gradient-Guided Augmentation in Amplitude and Pixel Spaces

TL;DR

The paper tackles out-of-domain robustness under real-world domain shifts by proposing D-GAP, a dataset-agnostic augmentation that operates in both the frequency (amplitude) and pixel spaces. It uses gradient-guided amplitude interpolation with a sensitivity map to derive a per-frequency mixing map , ensuring more biased frequencies are nudged toward the target domain, while a complementary pixel-space blending preserves spatial detail. Key contributions include the gradient-guided dual-space augmentation, a connectivity-based perspective to evaluate cross-domain feature changes, and state-of-the-art OOD results on four real-world datasets and three standard benchmarks. The method yields average OOD gains of about on real-world data and on benchmarks, showing strong generalization without dataset-specific tuning; limitations include additional gradient computation during training, with future work aimed at improving efficiency and integration with foundation models or self-supervised objectives.

Abstract

Out-of-domain (OOD) robustness is challenging to achieve in real-world computer vision applications, where shifts in image background, style, and acquisition instruments always degrade model performance. Generic augmentations show inconsistent gains under such shifts, whereas dataset-specific augmentations require expert knowledge and prior analysis. Moreover, prior studies show that neural networks adapt poorly to domain shifts because they exhibit a learning bias to domain-specific frequency components. Perturbing frequency values can mitigate such bias but overlooks pixel-level details, leading to suboptimal performance. To address these problems, we propose D-GAP (Dataset-agnostic and Gradient-guided augmentation in Amplitude and Pixel spaces), improving OOD robustness by introducing targeted augmentation in both the amplitude space (frequency space) and pixel space. Unlike conventional handcrafted augmentations, D-GAP computes sensitivity maps in the frequency space from task gradients, which reflect how strongly the model responds to different frequency components, and uses the maps to adaptively interpolate amplitudes between source and target samples. This way, D-GAP reduces the learning bias in frequency space, while a complementary pixel-space blending procedure restores fine spatial details. Extensive experiments on four real-world datasets and three domain-adaptation benchmarks show that D-GAP consistently outperforms both generic and dataset-specific augmentations, improving average OOD performance by +5.3% on real-world datasets and +1.8% on benchmark datasets.

Paper Structure

This paper contains 23 sections, 11 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: Feature decomposition and augmentation examples.Top: For each dataset (iWildCam, Camelyon17, BirdCalls, Galaxy10), we show a source image, its corresponding augmented image generated by our method. Bottom: Based on the decomposition framework shen2022connect, we annotate representative features in the datasets across $x_{\text{obj}}$, $x_{d\text{:robust}}$, $x_{d\text{:spu}}$, $x_{\text{noise}}$. Our method effectively randomizes $x_{d\text{:spu}}$, varies $x_{d\text{:robust}}$ while preserving or $x_{\text{obj}}$.
  • Figure 2: The figure shows the amplitude-only and phase-only reconstructions of the two input images, and the process where the amplitudes of Image 1 and Image 2 are first mixed, and the mixed amplitude is combined with the phase of Image 2 to generate an augmentation of Image 2.
  • Figure 3: Overview of the Gradient-guided Amplitude Mix procedure. Given a source image $x_1$ and a target-domain image $x_2$, we compute the Sensitivity Map $G(u,v)$ and the Mixing Map $D(u,v)$. The mixed amplitude is then combined with the phase of $x_1$.
  • Figure 4: We plot the in-domain (ID) performance versus out-of-domain (OOD) performance for all methods across four datasets. Our method consistently outperforms all baselines in OOD generalization. 'CL' refers to 'Connect Later'. 'CP' refers to 'Copy Paste'. 'SCJ' refers to 'Stain Color Jitter'.