EvolvED: Evolutionary Embeddings to Understand the Generation Process of Diffusion Models
Vidya Prasad, Hans van Gorp, Christina Humer, Ruud J. G. van Sloun, Anna Vilanova, Nicola Pezzotti
TL;DR
This paper tackles the challenge of understanding the iterative, high-dimensional evolution of diffusion-model outputs by introducing EvolvED, a holistic pipeline that combines user-defined prompts, intermediate-data sampling, external feature extraction, and a novel evolutionary embedding that explicitly encodes iterations. The core innovation is an embedding objective with semantic, displacement, and alignment losses that enable simultaneous preservation of semantic neighborhoods and cross-iteration coherence, rendered in both rectilinear and radial layouts. Empirically, EvolvED demonstrates its utility on GLIDE and Stable Diffusion across use cases involving ImageNet objects, human facial features, and image styles, with quantitative metrics showing competitive neighborhood preservation and substantial computational efficiency relative to vanilla t-SNE. The work offers practical insights for interpretable diffusion-model development, including guidance on feature extractors, prompt design, and scheduler effects, with potential to inform model design, training, and controllability in generative AI.
Abstract
Diffusion models, widely used in image generation, rely on iterative refinement to generate images from noise. Understanding this data evolution is important for model development and interpretability, yet challenging due to its high-dimensional, iterative nature. Prior works often focus on static or instance-level analyses, missing the iterative and holistic aspects of the generative path. While dimensionality reduction can visualize image evolution for few instances, it does preserve the iterative structure. To address these gaps, we introduce EvolvED, a method that presents a holistic view of the iterative generative process in diffusion models. EvolvED goes beyond instance exploration by leveraging predefined research questions to streamline generative space exploration. Tailored prompts aligned with these questions are used to extract intermediate images, preserving iterative context. Targeted feature extractors trace the evolution of key image attribute evolution, addressing the complexity of high-dimensional outputs. Central to EvolvED is a novel evolutionary embedding algorithm that encodes iterative steps while maintaining semantic relations. It enhances the visualization of data evolution by clustering semantically similar elements within each iteration with t-SNE, grouping elements by iteration, and aligning an instance's elements across iterations. We present rectilinear and radial layouts to represent iterations and support exploration. We apply EvolvED to diffusion models like GLIDE and Stable Diffusion, demonstrating its ability to provide valuable insights into the generative process.
