Table of Contents
Fetching ...

Pseudo Multi-Source Domain Generalization: Bridging the Gap Between Single and Multi-Source Domain Generalization

Shohei Enomoto

TL;DR

This work introduces Pseudo Multi-source Domain Generalization (PMDG), a framework that enables applying MDG algorithms in SDG settings by generating multiple pseudo-domains from a single source via style transformation and data augmentation. Using PseudoDomainBed, the authors evaluate PMDG across multiple datasets and architectures, showing that pseudo-domains can match or exceed actual multi-domain performance given sufficient data and appropriate transformations. A positive correlation between MDG and PMDG performance emerges, suggesting PMDG as a practical bridge between SDG and MDG. The findings highlight the importance of transformation choice, backbone architecture, and source-domain selection, and they point to future work on theoretical understanding of when pseudo-domains substitute real domain variation.

Abstract

Deep learning models often struggle to maintain performance when deployed on data distributions different from their training data, particularly in real-world applications where environmental conditions frequently change. While Multi-source Domain Generalization (MDG) has shown promise in addressing this challenge by leveraging multiple source domains during training, its practical application is limited by the significant costs and difficulties associated with creating multi-domain datasets. To address this limitation, we propose Pseudo Multi-source Domain Generalization (PMDG), a novel framework that enables the application of sophisticated MDG algorithms in more practical Single-source Domain Generalization (SDG) settings. PMDG generates multiple pseudo-domains from a single source domain through style transfer and data augmentation techniques, creating a synthetic multi-domain dataset that can be used with existing MDG algorithms. Through extensive experiments with PseudoDomainBed, our modified version of the DomainBed benchmark, we analyze the effectiveness of PMDG across multiple datasets and architectures. Our analysis reveals several key findings, including a positive correlation between MDG and PMDG performance and the potential of pseudo-domains to match or exceed actual multi-domain performance with sufficient data. These comprehensive empirical results provide valuable insights for future research in domain generalization. Our code is available at https://github.com/s-enmt/PseudoDomainBed.

Pseudo Multi-Source Domain Generalization: Bridging the Gap Between Single and Multi-Source Domain Generalization

TL;DR

This work introduces Pseudo Multi-source Domain Generalization (PMDG), a framework that enables applying MDG algorithms in SDG settings by generating multiple pseudo-domains from a single source via style transformation and data augmentation. Using PseudoDomainBed, the authors evaluate PMDG across multiple datasets and architectures, showing that pseudo-domains can match or exceed actual multi-domain performance given sufficient data and appropriate transformations. A positive correlation between MDG and PMDG performance emerges, suggesting PMDG as a practical bridge between SDG and MDG. The findings highlight the importance of transformation choice, backbone architecture, and source-domain selection, and they point to future work on theoretical understanding of when pseudo-domains substitute real domain variation.

Abstract

Deep learning models often struggle to maintain performance when deployed on data distributions different from their training data, particularly in real-world applications where environmental conditions frequently change. While Multi-source Domain Generalization (MDG) has shown promise in addressing this challenge by leveraging multiple source domains during training, its practical application is limited by the significant costs and difficulties associated with creating multi-domain datasets. To address this limitation, we propose Pseudo Multi-source Domain Generalization (PMDG), a novel framework that enables the application of sophisticated MDG algorithms in more practical Single-source Domain Generalization (SDG) settings. PMDG generates multiple pseudo-domains from a single source domain through style transfer and data augmentation techniques, creating a synthetic multi-domain dataset that can be used with existing MDG algorithms. Through extensive experiments with PseudoDomainBed, our modified version of the DomainBed benchmark, we analyze the effectiveness of PMDG across multiple datasets and architectures. Our analysis reveals several key findings, including a positive correlation between MDG and PMDG performance and the potential of pseudo-domains to match or exceed actual multi-domain performance with sufficient data. These comprehensive empirical results provide valuable insights for future research in domain generalization. Our code is available at https://github.com/s-enmt/PseudoDomainBed.

Paper Structure

This paper contains 30 sections, 3 equations, 7 figures, 3 tables, 1 algorithm.

Figures (7)

  • Figure 1: Overview of PMDG framework. PMDG applies multiple transformations to training samples to generate pseudo-domains. The DNN is then trained using an MDG algorithm on these pseudo-domains. The trained DNN aims to be robust against unknown domains.
  • Figure 2: Visualization of the transformed sample. We performed different types of transformations on the dog images of the PACS dataset.
  • Figure 3: Accuracy gains over the ERM baseline without pseudo-domain across different transformation techniques (y-axis) and MDG algorithms (x-axis) on the VLCS dataset. Green and red colors indicate performance improvements and degradation, respectively. Values represent accuracy differences from ERM without pseudo-domains.
  • Figure 4: Accuracy comparison of MDG algorithms across MDG and PMDG settings. Each point represents a different MDG algorithm. Accuracy represents averages across PACS, VLCS, OfficeHome, and TerraIncognita datasets.
  • Figure 5: Comparison of accuracy between MDG and PMDG settings under equal training data conditions. The x-axis shows the total number of training samples, while the y-axis shows the accuracy. Blue and orange lines represent MDG and PMDG settings, respectively.
  • ...and 2 more figures