What's in a Latent? Leveraging Diffusion Latent Space for Domain Generalization
Xavier Thomas, Deepti Ghadiyaram
TL;DR
This work investigates how pre-training objectives and model architectures shape latent representations for domain generalization and introduces GUIDE, a simple framework that discovers unsupervised pseudo-domains from frozen feature spaces and augments classifiers with these latent domain signals. The authors show that diffusion-model features excel at separating domains without explicit domain labels, enabling robust generalization across unseen domains and achieving competitive gains over ERM and domain-label-dependent methods on DomainBed benchmarks. The approach relies on unsupervised pseudo-domain learning via Kernel Mean Embeddings, followed by transforming and concatenating pseudo-domain signals with standard features to train a domain-adaptive classifier. Empirically, GUIDE with diffusion features yields notable improvements (up to +4% on TerraIncognita and higher on others) and demonstrates strong scalability to large datasets like DomainNet, highlighting the practical value of leveraging frozen diffusion latent spaces for domain generalization.
Abstract
Domain Generalization aims to develop models that can generalize to novel and unseen data distributions. In this work, we study how model architectures and pre-training objectives impact feature richness and propose a method to effectively leverage them for domain generalization. Specifically, given a pre-trained feature space, we first discover latent domain structures, referred to as pseudo-domains, that capture domain-specific variations in an unsupervised manner. Next, we augment existing classifiers with these complementary pseudo-domain representations making them more amenable to diverse unseen test domains. We analyze how different pre-training feature spaces differ in the domain-specific variances they capture. Our empirical studies reveal that features from diffusion models excel at separating domains in the absence of explicit domain labels and capture nuanced domain-specific information. On 5 datasets, we show that our very simple framework improves generalization to unseen domains by a maximum test accuracy improvement of over 4% compared to the standard baseline Empirical Risk Minimization (ERM). Crucially, our method outperforms most algorithms that access domain labels during training.
