Table of Contents
Fetching ...

Diffusion Model Conditioning on Gaussian Mixture Model and Negative Gaussian Mixture Gradient

Weiguo Lu, Xuan Wu, Deng Ding, Jinqiao Duan, Jirong Zhuang, Gangnan Yuan

TL;DR

This work advances diffusion modeling by conditioning on Gaussian Mixture Model–constructed latent distributions, treating conditioning inputs as distributions rather than fixed values. The authors provide a set-theoretic motivation showing feature-based conditioning yields fewer defects than class-based conditioning, and they introduce a classifier with a novel Negative Gaussian Mixture Gradient (NGMG) to stabilize training. They connect NGMG to the Wasserstein distance, proving similar convergence properties while enabling entropy-aware training that outperforms standard BCE in selective tasks. Experiments on CelebA demonstrate that GMM-conditioned latent spaces can produce high-quality, attribute-consistent generations, and the approach offers a principled path to robust, distributional conditioning in diffusion-based generation.

Abstract

Diffusion models (DMs) are a type of generative model that has a huge impact on image synthesis and beyond. They achieve state-of-the-art generation results in various generative tasks. A great diversity of conditioning inputs, such as text or bounding boxes, are accessible to control the generation. In this work, we propose a conditioning mechanism utilizing Gaussian mixture models (GMMs) as feature conditioning to guide the denoising process. Based on set theory, we provide a comprehensive theoretical analysis that shows that conditional latent distribution based on features and classes is significantly different, so that conditional latent distribution on features produces fewer defect generations than conditioning on classes. Two diffusion models conditioned on the Gaussian mixture model are trained separately for comparison. Experiments support our findings. A novel gradient function called the negative Gaussian mixture gradient (NGMG) is proposed and applied in diffusion model training with an additional classifier. Training stability has improved. We also theoretically prove that NGMG shares the same benefit as the Earth Mover distance (Wasserstein) as a more sensible cost function when learning distributions supported by low-dimensional manifolds.

Diffusion Model Conditioning on Gaussian Mixture Model and Negative Gaussian Mixture Gradient

TL;DR

This work advances diffusion modeling by conditioning on Gaussian Mixture Model–constructed latent distributions, treating conditioning inputs as distributions rather than fixed values. The authors provide a set-theoretic motivation showing feature-based conditioning yields fewer defects than class-based conditioning, and they introduce a classifier with a novel Negative Gaussian Mixture Gradient (NGMG) to stabilize training. They connect NGMG to the Wasserstein distance, proving similar convergence properties while enabling entropy-aware training that outperforms standard BCE in selective tasks. Experiments on CelebA demonstrate that GMM-conditioned latent spaces can produce high-quality, attribute-consistent generations, and the approach offers a principled path to robust, distributional conditioning in diffusion-based generation.

Abstract

Diffusion models (DMs) are a type of generative model that has a huge impact on image synthesis and beyond. They achieve state-of-the-art generation results in various generative tasks. A great diversity of conditioning inputs, such as text or bounding boxes, are accessible to control the generation. In this work, we propose a conditioning mechanism utilizing Gaussian mixture models (GMMs) as feature conditioning to guide the denoising process. Based on set theory, we provide a comprehensive theoretical analysis that shows that conditional latent distribution based on features and classes is significantly different, so that conditional latent distribution on features produces fewer defect generations than conditioning on classes. Two diffusion models conditioned on the Gaussian mixture model are trained separately for comparison. Experiments support our findings. A novel gradient function called the negative Gaussian mixture gradient (NGMG) is proposed and applied in diffusion model training with an additional classifier. Training stability has improved. We also theoretically prove that NGMG shares the same benefit as the Earth Mover distance (Wasserstein) as a more sensible cost function when learning distributions supported by low-dimensional manifolds.
Paper Structure (19 sections, 3 theorems, 51 equations, 14 figures, 1 algorithm)

This paper contains 19 sections, 3 theorems, 51 equations, 14 figures, 1 algorithm.

Key Result

Proposition 2.1

Under the GMM expansion setup, the 1-Wasserstein of two distribution is given by: where $I$ is the identity matrix, $B$ is a matrix given by:

Figures (14)

  • Figure 1: Images comparison. Left: real images from CelebA. Right: random generated samples from diffusion model conditioning on Gaussian mixture model. Both of these images have the same feature condition.
  • Figure 2: Image generation used conditional feature distribution. When model is given conditional distribution of $\mathcal{Z}$, model has two source of uncertainty. One is from the denoise Gaussian process. Second is the conditional distribution.
  • Figure 3: Image generated uses fixed certain $\mathcal{Z}$. When a particular sample of $\mathcal{Z}$ is given, uncertainty only coming from denoise Gaussian process. Each training image is assigned a sample value from $\mathcal{Z}$.
  • Figure 4: Our sampling process is constantly going back to $x_0$. More detail given in \ref{['Appendix2']}. $x_0^*$ is the predicted $x_0$. $\beta$ is the variable control the size of noise in diffusion process. The sampling process is shown by Eq.\ref{['denoise']}. Our model is trained with comparably larger $\beta$ and only take $T=100$ diffusion steps. Denoisng process start with $x_T$ as a Gaussian noise.
  • Figure 5: Probability event space of class and feature. Left: The latent event space condition on Class. Right: The latent event space condition on feature.
  • ...and 9 more figures

Theorems & Definitions (6)

  • Proposition 2.1
  • proof
  • Proposition 4.1
  • proof
  • Proposition 4.2
  • proof