Table of Contents
Fetching ...

Crafting Distribution Shifts for Validation and Training in Single Source Domain Generalization

Nikos Efthymiadis, Giorgos Tolias, Ondřej Chum

TL;DR

This work tackles single-source domain generalization by introducing an independent augmented validation set constructed from a broad spectrum of source-domain augmentations, which yields a strong correlation with target-domain test performance and enables better method selection and hyperparameter tuning. The authors also propose a shape-biased recognition approach that uses edge-based shape representations (binary thin edges via Sobel/BTE) during training and testing, coupled with a two-fold cross-validation over augmentation groups to prevent training-validation leakage. Across five diverse datasets, the augmented validation consistently outperforms standard validation, achieving near-oracle performance and state-of-the-art results when combined with the proposed training/testing schemes. The combination of exhaustive augmentations for validation and explicit shape information enhances robustness to distribution shifts and offers a practical, automated protocol for SSDG research and deployment.

Abstract

Single-source domain generalization attempts to learn a model on a source domain and deploy it to unseen target domains. Limiting access only to source domain data imposes two key challenges - how to train a model that can generalize and how to verify that it does. The standard practice of validation on the training distribution does not accurately reflect the model's generalization ability, while validation on the test distribution is a malpractice to avoid. In this work, we construct an independent validation set by transforming source domain images with a comprehensive list of augmentations, covering a broad spectrum of potential distribution shifts in target domains. We demonstrate a high correlation between validation and test performance for multiple methods and across various datasets. The proposed validation achieves a relative accuracy improvement over the standard validation equal to 15.4% or 1.6% when used for method selection or learning rate tuning, respectively. Furthermore, we introduce a novel family of methods that increase the shape bias through enhanced edge maps. To benefit from the augmentations during training and preserve the independence of the validation set, a k-fold validation process is designed to separate the augmentation types used in training and validation. The method that achieves the best performance on the augmented validation is selected from the proposed family. It achieves state-of-the-art performance on various standard benchmarks. Code at: https://github.com/NikosEfth/crafting-shifts

Crafting Distribution Shifts for Validation and Training in Single Source Domain Generalization

TL;DR

This work tackles single-source domain generalization by introducing an independent augmented validation set constructed from a broad spectrum of source-domain augmentations, which yields a strong correlation with target-domain test performance and enables better method selection and hyperparameter tuning. The authors also propose a shape-biased recognition approach that uses edge-based shape representations (binary thin edges via Sobel/BTE) during training and testing, coupled with a two-fold cross-validation over augmentation groups to prevent training-validation leakage. Across five diverse datasets, the augmented validation consistently outperforms standard validation, achieving near-oracle performance and state-of-the-art results when combined with the proposed training/testing schemes. The combination of exhaustive augmentations for validation and explicit shape information enhances robustness to distribution shifts and offers a practical, automated protocol for SSDG research and deployment.

Abstract

Single-source domain generalization attempts to learn a model on a source domain and deploy it to unseen target domains. Limiting access only to source domain data imposes two key challenges - how to train a model that can generalize and how to verify that it does. The standard practice of validation on the training distribution does not accurately reflect the model's generalization ability, while validation on the test distribution is a malpractice to avoid. In this work, we construct an independent validation set by transforming source domain images with a comprehensive list of augmentations, covering a broad spectrum of potential distribution shifts in target domains. We demonstrate a high correlation between validation and test performance for multiple methods and across various datasets. The proposed validation achieves a relative accuracy improvement over the standard validation equal to 15.4% or 1.6% when used for method selection or learning rate tuning, respectively. Furthermore, we introduce a novel family of methods that increase the shape bias through enhanced edge maps. To benefit from the augmentations during training and preserve the independence of the validation set, a k-fold validation process is designed to separate the augmentation types used in training and validation. The method that achieves the best performance on the augmented validation is selected from the proposed family. It achieves state-of-the-art performance on various standard benchmarks. Code at: https://github.com/NikosEfth/crafting-shifts
Paper Structure (48 sections, 15 figures, 5 tables)

This paper contains 48 sections, 15 figures, 5 tables.

Figures (15)

  • Figure 1: Test accuracy comparison of different validation methods across seven dataset-backbone configurations. Each axis ranges from $70\%$ to $100\%$ of the oracle validation $V_O$ that assumes access to the test set. The proposed augmented validation $V_A$ achieves over $\boldsymbol{98\%}$ of the oracle performance on average, representing a $\boldsymbol{15.4\%}$ relevant improvement over the standard validation $V_S$. For each dataset, the model is chosen from a pool, with $4,500$ trained models across all pools.
  • Figure 2: "Hummingbird" in four different domains. The domain generalization task tacitly assumes domains that share informative, human-understandable features, such as texture, shape, or semantics. Therefore, the unseen domains are expected to be human-recognizable. Images from ImageNet dsl+09 and ImageNet-R imgnetr
  • Figure 3: Overview of the training, validation, and testing pipeline. Training images are augmented with basic and a sub-set of extra augmentations. The shape information, encoded as binary thin edges of the augmented image, generates an additional training example. The contribution of the two losses, image and shape-based, is weighed by a parameter $\lambda$. In validation, extra augmentations that were not included in the training are used to synthesize unseen distributions. The shape information is optionally exploited in testing and in the validation phase. The final prediction is obtained by ensembling the image and the shape-based predictions, weighted by a parameter $w$.
  • Figure 4: Correlation between validation and test accuracy across the proposed variants. Standard validation $V_S$ is performed on the validation set of the source domain, while the proposed augmented validation $V_A$ uses images alternated by augmentations unseen during training. Each point represents a different training-testing model variant.
  • Figure 5: Correlation between validation and test accuracy across literature methods and our main variant $\hat{I}S \space\rightarrow\space I^{.75}S$ using standard $V_S$ and augmented $V_A$ validation set. The best model, according to each validation performance, is marked with a star. The test performance of the best model per validation set is summarized in the bar plot. $V_A$ achieves significant test accuracy improvements of $22.2$ in PACS and $7.3$ in Mini-DomainNet.
  • ...and 10 more figures