Table of Contents
Fetching ...

Understanding Hallucinations in Diffusion Models through Mode Interpolation

Sumukh K Aithal, Pratyush Maini, Zachary C. Lipton, J. Zico Kolter

TL;DR

Diffusion-based image generation can produce hallucinations by mode interpolation, where samples fill the gaps between nearby data modes and lie outside the training support. The authors show this emerges from a smoothed learned score that cannot reproduce sharp mode discontinuities, using 1D/2D Gaussian mixtures and simple shapes to illustrate the phenomenon, and extend to real hands. They introduce a trajectory-variance based Hal metric to detect hallucinations and demonstrate that it can remove a large majority of hallucinated outputs while preserving in-support samples across multiple datasets, including MNIST. The work further reveals how hallucinations interact with recursive training, amplifying distribution drift, and demonstrates that pre-emptive detection during generation or filtering prior to retraining can mitigate model collapse in recursive cycles. Overall, the study provides a concrete mechanism for hallucination in diffusion models and a practical mitigation tool with implications for data curation and iterative model training.

Abstract

Colloquially speaking, image generation models based upon diffusion processes are frequently said to exhibit "hallucinations," samples that could never occur in the training data. But where do such hallucinations come from? In this paper, we study a particular failure mode in diffusion models, which we term mode interpolation. Specifically, we find that diffusion models smoothly "interpolate" between nearby data modes in the training set, to generate samples that are completely outside the support of the original training distribution; this phenomenon leads diffusion models to generate artifacts that never existed in real data (i.e., hallucinations). We systematically study the reasons for, and the manifestation of this phenomenon. Through experiments on 1D and 2D Gaussians, we show how a discontinuous loss landscape in the diffusion model's decoder leads to a region where any smooth approximation will cause such hallucinations. Through experiments on artificial datasets with various shapes, we show how hallucination leads to the generation of combinations of shapes that never existed. Finally, we show that diffusion models in fact know when they go out of support and hallucinate. This is captured by the high variance in the trajectory of the generated sample towards the final few backward sampling process. Using a simple metric to capture this variance, we can remove over 95% of hallucinations at generation time while retaining 96% of in-support samples. We conclude our exploration by showing the implications of such hallucination (and its removal) on the collapse (and stabilization) of recursive training on synthetic data with experiments on MNIST and 2D Gaussians dataset. We release our code at https://github.com/locuslab/diffusion-model-hallucination.

Understanding Hallucinations in Diffusion Models through Mode Interpolation

TL;DR

Diffusion-based image generation can produce hallucinations by mode interpolation, where samples fill the gaps between nearby data modes and lie outside the training support. The authors show this emerges from a smoothed learned score that cannot reproduce sharp mode discontinuities, using 1D/2D Gaussian mixtures and simple shapes to illustrate the phenomenon, and extend to real hands. They introduce a trajectory-variance based Hal metric to detect hallucinations and demonstrate that it can remove a large majority of hallucinated outputs while preserving in-support samples across multiple datasets, including MNIST. The work further reveals how hallucinations interact with recursive training, amplifying distribution drift, and demonstrates that pre-emptive detection during generation or filtering prior to retraining can mitigate model collapse in recursive cycles. Overall, the study provides a concrete mechanism for hallucination in diffusion models and a practical mitigation tool with implications for data curation and iterative model training.

Abstract

Colloquially speaking, image generation models based upon diffusion processes are frequently said to exhibit "hallucinations," samples that could never occur in the training data. But where do such hallucinations come from? In this paper, we study a particular failure mode in diffusion models, which we term mode interpolation. Specifically, we find that diffusion models smoothly "interpolate" between nearby data modes in the training set, to generate samples that are completely outside the support of the original training distribution; this phenomenon leads diffusion models to generate artifacts that never existed in real data (i.e., hallucinations). We systematically study the reasons for, and the manifestation of this phenomenon. Through experiments on 1D and 2D Gaussians, we show how a discontinuous loss landscape in the diffusion model's decoder leads to a region where any smooth approximation will cause such hallucinations. Through experiments on artificial datasets with various shapes, we show how hallucination leads to the generation of combinations of shapes that never existed. Finally, we show that diffusion models in fact know when they go out of support and hallucinate. This is captured by the high variance in the trajectory of the generated sample towards the final few backward sampling process. Using a simple metric to capture this variance, we can remove over 95% of hallucinations at generation time while retaining 96% of in-support samples. We conclude our exploration by showing the implications of such hallucination (and its removal) on the collapse (and stabilization) of recursive training on synthetic data with experiments on MNIST and 2D Gaussians dataset. We release our code at https://github.com/locuslab/diffusion-model-hallucination.
Paper Structure (37 sections, 4 equations, 23 figures)

This paper contains 37 sections, 4 equations, 23 figures.

Figures (23)

  • Figure 1: Hallucinations in Diffusion Models: Original Dataset (Left) & Generated Dataset (Right). (Top) The original dataset consists of 64x64 images divided into three columns, each containing a triangle, square, or pentagon with a 0.5 probability of the shape being present. Each shape appears at most once per image. The generated dataset created using an unconditional DDPM includes some samples (hallucinations) with multiple occurrences of the same shape that is unseen in the original dataset. (Bottom) We also train a ADM nichol2021improved on a dataset of high-quality images of human hands and show that the diffusion model generates hallucinated images of hands with additional fingers.
  • Figure 2: Mode Interpolation in 1D Gaussian. The red curve indicates the PDF of the true data distribution $q(x)$, which is a mixture of 3 Gaussians (notice that the y-axis is in log-scale). In blue, we show a density histogram of the samples generated by a DDPM trained on varying number of samples from the true data distribution. For each histogram, we sampled 100 million examples from the diffusion model to observe the interpolated distribution. (a,b) show how the density of samples generated in the interpolated region reduces with an increase in the number of samples from the real distribution (used for training the DDPM). (c,d) show the impact of moving one of the modes (originally at $\mu = 3)$ to $\mu=4$. We see how the density of samples generated in the region between distant (but neighboring) modes is significantly lesser than that between nearby modes.
  • Figure 3: Mode Interpolation in 2D Gaussian. The dataset consists of a mixture of 25 Gaussians arranged in a square grid, with a training set containing 100,000 samples. (a,b) The blue points represent samples generated by a DDPM, with visible density between the nearest modes of the original Gaussian mixture (in orange). These interpolated samples have near-zero probability in the original distribution. (c,d) We trained a DDPM on a rotated version of the dataset where the modes form a diamond shape. In this configuration, we see no interpolation along the x-axis, illustrating that diffusion models interpolate between nearest modes.
  • Figure 4: Explaining Mode Interpolation via Learned Score Function. The left panel shows the ground truth score function for a mixture of Gaussians across various timesteps, while the right panel illustrates the score function learned by the neural network. While the true score function exhibits sharp jumps that separate distinct modes (particularly in the initial time steps), the neural network approximates a smoother version.
  • Figure 5: Hands Dataset. We train a ADM on the Hands dataset with 5000 images (first column) and show that the generated samples (second column) consists of hallucinated samples (additional/missing fingers). We then apply our proposed metric to detect these hallucinated samples (third column).
  • ...and 18 more figures