Table of Contents
Fetching ...

Single Domain Generalization with Model-aware Parametric Batch-wise Mixup

Marzi Heidari, Yuhong Guo

TL;DR

This work tackles Single Domain Generalization (SDG) by proposing Model-aware Parametric Batch-wise Mixup (MPBM), a data augmentation framework that synthesizes informative instances to improve cross-domain robustness. MPBM uses a batch-wise, parameterized mixup generator guided by model-aware adversarial queries generated via stochastic gradient Langevin dynamics, coupled with an attention mechanism over features. An adversarial alignment loss with a discriminator regularizes the generator to prevent excessive deviation from real data. Across digits, PACS, and DomainNet benchmarks, MPBM achieves state-of-the-art SDG performance, demonstrating its effectiveness in expanding the representation space and enhancing generalization to unseen domains.

Abstract

Single Domain Generalization (SDG) remains a formidable challenge in the field of machine learning, particularly when models are deployed in environments that differ significantly from their training domains. In this paper, we propose a novel data augmentation approach, named as Model-aware Parametric Batch-wise Mixup (MPBM), to tackle the challenge of SDG. MPBM deploys adversarial queries generated with stochastic gradient Langevin dynamics, and produces model-aware augmenting instances with a parametric batch-wise mixup generator network that is carefully designed through an innovative attention mechanism. By exploiting inter-feature correlations, the parameterized mixup generator introduces additional versatility in combining features across a batch of instances, thereby enhancing the capacity to generate highly adaptive and informative synthetic instances for specific queries. The synthetic data produced by this adaptable generator network, guided by informative queries, is expected to significantly enrich the representation space covered by the original training dataset and subsequently enhance the prediction model's generalizability across diverse and previously unseen domains. To prevent excessive deviation from the training data, we further incorporate a real-data alignment-based adversarial loss into the learning process of MPBM, regularizing any tendencies toward undesirable expansions. We conduct extensive experiments on several benchmark datasets. The empirical results demonstrate that by augmenting the training set with informative synthesis data, our proposed MPBM method achieves the state-of-the-art performance for single domain generalization.

Single Domain Generalization with Model-aware Parametric Batch-wise Mixup

TL;DR

This work tackles Single Domain Generalization (SDG) by proposing Model-aware Parametric Batch-wise Mixup (MPBM), a data augmentation framework that synthesizes informative instances to improve cross-domain robustness. MPBM uses a batch-wise, parameterized mixup generator guided by model-aware adversarial queries generated via stochastic gradient Langevin dynamics, coupled with an attention mechanism over features. An adversarial alignment loss with a discriminator regularizes the generator to prevent excessive deviation from real data. Across digits, PACS, and DomainNet benchmarks, MPBM achieves state-of-the-art SDG performance, demonstrating its effectiveness in expanding the representation space and enhancing generalization to unseen domains.

Abstract

Single Domain Generalization (SDG) remains a formidable challenge in the field of machine learning, particularly when models are deployed in environments that differ significantly from their training domains. In this paper, we propose a novel data augmentation approach, named as Model-aware Parametric Batch-wise Mixup (MPBM), to tackle the challenge of SDG. MPBM deploys adversarial queries generated with stochastic gradient Langevin dynamics, and produces model-aware augmenting instances with a parametric batch-wise mixup generator network that is carefully designed through an innovative attention mechanism. By exploiting inter-feature correlations, the parameterized mixup generator introduces additional versatility in combining features across a batch of instances, thereby enhancing the capacity to generate highly adaptive and informative synthetic instances for specific queries. The synthetic data produced by this adaptable generator network, guided by informative queries, is expected to significantly enrich the representation space covered by the original training dataset and subsequently enhance the prediction model's generalizability across diverse and previously unseen domains. To prevent excessive deviation from the training data, we further incorporate a real-data alignment-based adversarial loss into the learning process of MPBM, regularizing any tendencies toward undesirable expansions. We conduct extensive experiments on several benchmark datasets. The empirical results demonstrate that by augmenting the training set with informative synthesis data, our proposed MPBM method achieves the state-of-the-art performance for single domain generalization.

Paper Structure

This paper contains 18 sections, 12 equations, 2 figures, 4 tables, 1 algorithm.

Figures (2)

  • Figure 1: Overview of the training process for the mixup generator. The process begins with a pre-trained feature extractor $f_{\theta}$ and classifier $h_{\psi}$, whose parameters ($\theta$ and $\psi$) remain frozen during the training of the Mixup Generator Network (MGN) $g_{\phi}$. Query image inputs undergo Langevin Stochastic Query Augmentation. The Entry-Wise Feature Attention Mechanism captures intricate interactions among feature dimensions by incorporating a correlation matrix, facilitating precise feature mixups. The mixup generation loss $\mathcal{L}_{\text{mix}}^{\text{gen}}$ is being optimized for the generated mixup samples. The Adversarial Mixup Generation component aligns generated mixup features with real data through an adversarial training framework, optimizing the adversarial loss $\mathcal{L}_{\text{adv}}(\phi, \omega)$.
  • Figure 2: Sensitivity analysis for four hyper-parameters $\lambda_{\text{adv}}$, $\lambda_{\text{mix}}$, $T$ and $N_b$ on USPS dataset.