Table of Contents
Fetching ...

A Sampling-Based Domain Generalization Study with Diffusion Generative Models

Ye Zhu, Yu Wu, Duo Xu, Zhiwei Deng, Yan Yan, Olga Russakovsky

TL;DR

We address domain generalization for diffusion models in a few-shot setting by introducing a sampling-based approach that inverts unseen data to obtain latent priors. The method leverages deterministic DDIM denoising on frozen pre-trained diffusion models to sample from OOD latent priors, which are approximately Gaussian and separable from the in-domain prior, enabling synthesis of unseen-domain data without fine-tuning. Empirical results across multiple natural-image and astrophysical datasets show improved generation of unseen-domain samples, particularly when domain gaps are large, while preserving the original-domain quality. This tuning-free, data-efficient paradigm broadens the applicability of diffusion models to data-sparse scientific domains and suggests avenues for further cross-domain sampling strategies.

Abstract

In this work, we investigate the domain generalization capabilities of diffusion models in the context of synthesizing images that are distinct from the training data. Instead of fine-tuning, we tackle this challenge from a sampling-based perspective using frozen, pre-trained diffusion models. Specifically, we demonstrate that arbitrary out-of-domain (OOD) images establish Gaussian priors in the latent spaces of a given model after inversion, and that these priors are separable from those of the original training domain. This OOD latent property allows us to synthesize new images of the target unseen domain by discovering qualified OOD latent encodings in the inverted noisy spaces, without altering the pre-trained models. Our cross-model and cross-domain experiments show that the proposed sampling-based method can expand the latent space and generate unseen images without impairing the generation quality of the original domain. We also showcase a practical application of our approach using astrophysical data, highlighting the potential of this generalization paradigm in data-sparse fields such as scientific exploration.

A Sampling-Based Domain Generalization Study with Diffusion Generative Models

TL;DR

We address domain generalization for diffusion models in a few-shot setting by introducing a sampling-based approach that inverts unseen data to obtain latent priors. The method leverages deterministic DDIM denoising on frozen pre-trained diffusion models to sample from OOD latent priors, which are approximately Gaussian and separable from the in-domain prior, enabling synthesis of unseen-domain data without fine-tuning. Empirical results across multiple natural-image and astrophysical datasets show improved generation of unseen-domain samples, particularly when domain gaps are large, while preserving the original-domain quality. This tuning-free, data-efficient paradigm broadens the applicability of diffusion models to data-sparse scientific domains and suggests avenues for further cross-domain sampling strategies.

Abstract

In this work, we investigate the domain generalization capabilities of diffusion models in the context of synthesizing images that are distinct from the training data. Instead of fine-tuning, we tackle this challenge from a sampling-based perspective using frozen, pre-trained diffusion models. Specifically, we demonstrate that arbitrary out-of-domain (OOD) images establish Gaussian priors in the latent spaces of a given model after inversion, and that these priors are separable from those of the original training domain. This OOD latent property allows us to synthesize new images of the target unseen domain by discovering qualified OOD latent encodings in the inverted noisy spaces, without altering the pre-trained models. Our cross-model and cross-domain experiments show that the proposed sampling-based method can expand the latent space and generate unseen images without impairing the generation quality of the original domain. We also showcase a practical application of our approach using astrophysical data, highlighting the potential of this generalization paradigm in data-sparse fields such as scientific exploration.
Paper Structure (28 sections, 2 theorems, 14 equations, 9 figures, 3 tables, 1 algorithm)

This paper contains 28 sections, 2 theorems, 14 equations, 9 figures, 3 tables, 1 algorithm.

Key Result

Lemma C.1

For $q_\sigma(\mathbf{x}_{1:T}|\mathbf{x}_0$) defined in Eqn. eq:9 and $q_\sigma(\mathbf{x}_{t-1}|\mathbf{x}_t,\mathbf{x}_0)$ defined in Eqn. eq:10, we have:

Figures (9)

  • Figure 1: Illustration of the trajectory-tuning based paradigm (left) and our proposed latent-sampling based paradigm (right) for OOD image synthesis with diffusion models. Given a pre-trained DM $p_\theta$ on images from domain $\mathcal{D}_{id}$, most existing methods seek to finetune the generation trajectories $p_{\theta'}$ to synthesis data in a new domain $\mathcal{D}_{od}$. In contrast, we propose to discover unseen latent encodings to achieve the same goal via the frozen model $p_\theta$ by expanding the latent spaces.
  • Figure 2: Examples of synthesized OOD images across data domains and methods. (a) All the OOD samples are obtained via our proposed sampling method using a pre-trained DMs on AFHQ-Dog choi2020stargan. (b) The same sampling method can also be applied to achieve style transfer (e.g., between RGB and sketch images).
  • Figure 3: Various visualizations of "mode interference". Given an example setting of synthesizing human faces from DDPMs trained on dogs. (a): An interfered image of human faces gradually becomes similar to its original trained domain as the denoising trajectory shifts from deterministic ($\eta$ = 0) to stochastic ($\eta$ = 1). (b): Failure cases happen when sampled latent OOD encodings are captured by the model's original probabilistic concentration mass. (c): Inverted OOD latent encodings preserve slight perceptible low-level visual features and are not perfect Gaussians but rather approximations.
  • Figure 4: Illustration of various geometric properties of high-dimensional Gaussians. (a) and (b) show the probability concentration mass is mainly centered around a thin annulus around the equator. (c) illustrates the geometric observation on the orthogonality of sample pairs. (d) illustrates the idea of separating two Gaussian distributions in high-dimensional spaces.
  • Figure 5: Illustration of unseen trajectory bandwidth at different diffusion steps. We show qualitative examples using the iDDPM nichol2021improveddmp trained on AFHQ-Dog-256 as the base model, the examples of church and cat are both unseen domain images. The image in green boxes indicates the bandwidth we have empirically selected to preserve the reconstruction quality. Compared to the trained image domain (i.e., dogs), cats have a smaller domain gap than churches. Different from the conventional understanding that a smaller domain gap is beneficial for better and easier generalization from a trained model, we observe a larger domain gap signifies a larger bandwidth, making it easier to perform the OOD sampling and synthesis.
  • ...and 4 more figures

Theorems & Definitions (2)

  • Lemma C.1
  • Lemma C.2