Table of Contents
Fetching ...

EvolvED: Evolutionary Embeddings to Understand the Generation Process of Diffusion Models

Vidya Prasad, Hans van Gorp, Christina Humer, Ruud J. G. van Sloun, Anna Vilanova, Nicola Pezzotti

TL;DR

This paper tackles the challenge of understanding the iterative, high-dimensional evolution of diffusion-model outputs by introducing EvolvED, a holistic pipeline that combines user-defined prompts, intermediate-data sampling, external feature extraction, and a novel evolutionary embedding that explicitly encodes iterations. The core innovation is an embedding objective with semantic, displacement, and alignment losses that enable simultaneous preservation of semantic neighborhoods and cross-iteration coherence, rendered in both rectilinear and radial layouts. Empirically, EvolvED demonstrates its utility on GLIDE and Stable Diffusion across use cases involving ImageNet objects, human facial features, and image styles, with quantitative metrics showing competitive neighborhood preservation and substantial computational efficiency relative to vanilla t-SNE. The work offers practical insights for interpretable diffusion-model development, including guidance on feature extractors, prompt design, and scheduler effects, with potential to inform model design, training, and controllability in generative AI.

Abstract

Diffusion models, widely used in image generation, rely on iterative refinement to generate images from noise. Understanding this data evolution is important for model development and interpretability, yet challenging due to its high-dimensional, iterative nature. Prior works often focus on static or instance-level analyses, missing the iterative and holistic aspects of the generative path. While dimensionality reduction can visualize image evolution for few instances, it does preserve the iterative structure. To address these gaps, we introduce EvolvED, a method that presents a holistic view of the iterative generative process in diffusion models. EvolvED goes beyond instance exploration by leveraging predefined research questions to streamline generative space exploration. Tailored prompts aligned with these questions are used to extract intermediate images, preserving iterative context. Targeted feature extractors trace the evolution of key image attribute evolution, addressing the complexity of high-dimensional outputs. Central to EvolvED is a novel evolutionary embedding algorithm that encodes iterative steps while maintaining semantic relations. It enhances the visualization of data evolution by clustering semantically similar elements within each iteration with t-SNE, grouping elements by iteration, and aligning an instance's elements across iterations. We present rectilinear and radial layouts to represent iterations and support exploration. We apply EvolvED to diffusion models like GLIDE and Stable Diffusion, demonstrating its ability to provide valuable insights into the generative process.

EvolvED: Evolutionary Embeddings to Understand the Generation Process of Diffusion Models

TL;DR

This paper tackles the challenge of understanding the iterative, high-dimensional evolution of diffusion-model outputs by introducing EvolvED, a holistic pipeline that combines user-defined prompts, intermediate-data sampling, external feature extraction, and a novel evolutionary embedding that explicitly encodes iterations. The core innovation is an embedding objective with semantic, displacement, and alignment losses that enable simultaneous preservation of semantic neighborhoods and cross-iteration coherence, rendered in both rectilinear and radial layouts. Empirically, EvolvED demonstrates its utility on GLIDE and Stable Diffusion across use cases involving ImageNet objects, human facial features, and image styles, with quantitative metrics showing competitive neighborhood preservation and substantial computational efficiency relative to vanilla t-SNE. The work offers practical insights for interpretable diffusion-model development, including guidance on feature extractors, prompt design, and scheduler effects, with potential to inform model design, training, and controllability in generative AI.

Abstract

Diffusion models, widely used in image generation, rely on iterative refinement to generate images from noise. Understanding this data evolution is important for model development and interpretability, yet challenging due to its high-dimensional, iterative nature. Prior works often focus on static or instance-level analyses, missing the iterative and holistic aspects of the generative path. While dimensionality reduction can visualize image evolution for few instances, it does preserve the iterative structure. To address these gaps, we introduce EvolvED, a method that presents a holistic view of the iterative generative process in diffusion models. EvolvED goes beyond instance exploration by leveraging predefined research questions to streamline generative space exploration. Tailored prompts aligned with these questions are used to extract intermediate images, preserving iterative context. Targeted feature extractors trace the evolution of key image attribute evolution, addressing the complexity of high-dimensional outputs. Central to EvolvED is a novel evolutionary embedding algorithm that encodes iterative steps while maintaining semantic relations. It enhances the visualization of data evolution by clustering semantically similar elements within each iteration with t-SNE, grouping elements by iteration, and aligning an instance's elements across iterations. We present rectilinear and radial layouts to represent iterations and support exploration. We apply EvolvED to diffusion models like GLIDE and Stable Diffusion, demonstrating its ability to provide valuable insights into the generative process.

Paper Structure

This paper contains 19 sections, 11 equations, 18 figures.

Figures (18)

  • Figure 1: Theoretical representation of the diffusion process song2021scorebased, where the training data distribution $u(\textbf{h}^i_0)$ is represented as a 1D Gaussian. During generation (left to right), instances evolve from noise $\textbf{h}^{i}_T$ to a coherent image $\textbf{h}^{i}_0$. Intermediate steps also produce fuzzy, denoised estimates $\textbf{h}^{i}{t_0}$, which gradually refine into clearer images on convergence.
  • Figure 2: Evolutionary embedding process for the (a) rectilinear and (b) radial layout. Elements of iterations $t$ are attracted to dark, low-cost areas based on a Gaussian at a displacement value ($\bar{x}_t$ or $\bar{r}_t$) per $t$. Elements of an instance across iterations are aligned based on their $y$ or $\theta$.
  • Figure 3: Applying evolutionary embedding on GLIDE images with various feature extractors. (a) Noisy $\textbf{h}_{t-1}$ and (d) step-wise denoised images $\textbf{h}_{t_0}$ undergo evolutionary embedding. The noisy $\textbf{h}_{t-1}$ are encoded with (b) CLIP image encoder, and (c) a robust ImageNet classifier bai2021transformers before the embedding creation. Similarly, denoised $\textbf{h}_{t_0}$ are encoded with (e) a CLIP and (f) a robust ImageNet classifier.
  • Figure 4: Single vanilla t-SNE van2008tsne embedding on all $\widehat{\textbf{h}}{}_{t-1}$ extracted from the GLIDE ImageNet use case with a classifier bai2021transformers. Points are colored by (a) iteration $t$ and (b) prompt. Radial layout without (d) and with (e) the instance alignment loss. Pathways of $"$tables$"$, $"$sharks$"$, $"$fruits$"$ are shown on the vanilla t-SNE (c) and radial layout (f).
  • Figure 5: $Q^t_\text{trust}$ (left) and $Q^t_\text{cont}$ (right) of full vanilla t-SNE across $t$, step-wise vanilla t-SNE per $t$ ($C_s$), and the proposed radial ($radial$) and rectilinear ($rect$) layouts without ($C_s$ with $C_d$) and with the alignment metric ($C_s$, $C_d$, and $C_a$) across iterations $t$. Embeddings are obtained on $\widehat{\textbf{h}}{}^i_{t-1}$.
  • ...and 13 more figures