Table of Contents
Fetching ...

What's in a Latent? Leveraging Diffusion Latent Space for Domain Generalization

Xavier Thomas, Deepti Ghadiyaram

TL;DR

This work investigates how pre-training objectives and model architectures shape latent representations for domain generalization and introduces GUIDE, a simple framework that discovers unsupervised pseudo-domains from frozen feature spaces and augments classifiers with these latent domain signals. The authors show that diffusion-model features excel at separating domains without explicit domain labels, enabling robust generalization across unseen domains and achieving competitive gains over ERM and domain-label-dependent methods on DomainBed benchmarks. The approach relies on unsupervised pseudo-domain learning via Kernel Mean Embeddings, followed by transforming and concatenating pseudo-domain signals with standard features to train a domain-adaptive classifier. Empirically, GUIDE with diffusion features yields notable improvements (up to +4% on TerraIncognita and higher on others) and demonstrates strong scalability to large datasets like DomainNet, highlighting the practical value of leveraging frozen diffusion latent spaces for domain generalization.

Abstract

Domain Generalization aims to develop models that can generalize to novel and unseen data distributions. In this work, we study how model architectures and pre-training objectives impact feature richness and propose a method to effectively leverage them for domain generalization. Specifically, given a pre-trained feature space, we first discover latent domain structures, referred to as pseudo-domains, that capture domain-specific variations in an unsupervised manner. Next, we augment existing classifiers with these complementary pseudo-domain representations making them more amenable to diverse unseen test domains. We analyze how different pre-training feature spaces differ in the domain-specific variances they capture. Our empirical studies reveal that features from diffusion models excel at separating domains in the absence of explicit domain labels and capture nuanced domain-specific information. On 5 datasets, we show that our very simple framework improves generalization to unseen domains by a maximum test accuracy improvement of over 4% compared to the standard baseline Empirical Risk Minimization (ERM). Crucially, our method outperforms most algorithms that access domain labels during training.

What's in a Latent? Leveraging Diffusion Latent Space for Domain Generalization

TL;DR

This work investigates how pre-training objectives and model architectures shape latent representations for domain generalization and introduces GUIDE, a simple framework that discovers unsupervised pseudo-domains from frozen feature spaces and augments classifiers with these latent domain signals. The authors show that diffusion-model features excel at separating domains without explicit domain labels, enabling robust generalization across unseen domains and achieving competitive gains over ERM and domain-label-dependent methods on DomainBed benchmarks. The approach relies on unsupervised pseudo-domain learning via Kernel Mean Embeddings, followed by transforming and concatenating pseudo-domain signals with standard features to train a domain-adaptive classifier. Empirically, GUIDE with diffusion features yields notable improvements (up to +4% on TerraIncognita and higher on others) and demonstrates strong scalability to large datasets like DomainNet, highlighting the practical value of leveraging frozen diffusion latent spaces for domain generalization.

Abstract

Domain Generalization aims to develop models that can generalize to novel and unseen data distributions. In this work, we study how model architectures and pre-training objectives impact feature richness and propose a method to effectively leverage them for domain generalization. Specifically, given a pre-trained feature space, we first discover latent domain structures, referred to as pseudo-domains, that capture domain-specific variations in an unsupervised manner. Next, we augment existing classifiers with these complementary pseudo-domain representations making them more amenable to diverse unseen test domains. We analyze how different pre-training feature spaces differ in the domain-specific variances they capture. Our empirical studies reveal that features from diffusion models excel at separating domains in the absence of explicit domain labels and capture nuanced domain-specific information. On 5 datasets, we show that our very simple framework improves generalization to unseen domains by a maximum test accuracy improvement of over 4% compared to the standard baseline Empirical Risk Minimization (ERM). Crucially, our method outperforms most algorithms that access domain labels during training.

Paper Structure

This paper contains 26 sections, 2 equations, 24 figures, 15 tables, 1 algorithm.

Figures (24)

  • Figure 1: T-SNE visualization of the latent space from different pre-training objectives: CLIP pmlr-v139-radford21a, DiT dit, MAE MAE, ResNet-50 rn50 on the domain generalization benchmark VLCS VLCS. VLCS is curated from $4$ different datasets, thus dataset-specific biases like spatial composition and object size variations serve as different domains. Note how the diffusion features separate domains effectively, suggesting that latent domain structures can be captured without explicit supervision. Best viewed in color.
  • Figure 2: Training Pipeline. The green-shaded region represents the clustering and transformation step. Green solid arrows indicate gradient flow, while red arrows represent non-gradient operations. The feature extractor ${\bm \Psi}$ first clusters samples to compute the pseudo-domain centroids. The transformation function $\mathcal{T}$ then transforms these centroids to the latent space of ${\bm \Phi}$, producing transformed pseudo-domain centroids, which are concatenated with the features from ${\bm \Phi}$, and sent to the classifier.
  • Figure 3: T-SNE visualization of how pseudo-domains are clustered together in the latent space of DiT for PACS. Note how the sketch domain forms distinct clusters, with light and dark pencil strokes mapped to separate regions in the latent space. Best viewed in color.
  • Figure 4: Pseudo-domains captured in the diffusion latent space of DiT on PACS. The clusters group images based on nuanced style-specific variances rather than class-specific variances.
  • Figure 5: Example images from Synth-Artists and Synth-Photography, generated using Stable Diffusion XLsdxl. Synth-Artists includes artistic styles such as Van Gogh and Kinkade, while the Synth-Photography captures photography effects like Tilt-Shift and Bokeh.
  • ...and 19 more figures