Diffusion Explainer: Visual Explanation for Text-to-image Stable Diffusion

Seongmin Lee; Benjamin Hoover; Hendrik Strobelt; Zijie J. Wang; ShengYun Peng; Austin Wright; Kevin Li; Haekyu Park; Haoyang Yang; Duen Horng Chau

Diffusion Explainer: Visual Explanation for Text-to-image Stable Diffusion

Seongmin Lee, Benjamin Hoover, Hendrik Strobelt, Zijie J. Wang, ShengYun Peng, Austin Wright, Kevin Li, Haekyu Park, Haoyang Yang, Duen Horng Chau

TL;DR

Diffusion Explainer is the first interactive visualization tool that explains how Stable Diffusion transforms text prompts into images, and tightly integrates a visual overview of Stable Diffusion’s complex structure with explanations of the underlying operations.

Abstract

Diffusion-based generative models' impressive ability to create convincing images has garnered global attention. However, their complex structures and operations often pose challenges for non-experts to grasp. We present Diffusion Explainer, the first interactive visualization tool that explains how Stable Diffusion transforms text prompts into images. Diffusion Explainer tightly integrates a visual overview of Stable Diffusion's complex structure with explanations of the underlying operations. By comparing image generation of prompt variants, users can discover the impact of keyword changes on image generation. A 56-participant user study demonstrates that Diffusion Explainer offers substantial learning benefits to non-experts. Our tool has been used by over 10,300 users from 124 countries at https://poloclub.github.io/diffusion-explainer/.

Diffusion Explainer: Visual Explanation for Text-to-image Stable Diffusion

TL;DR

Abstract

Paper Structure (11 sections, 6 figures)

This paper contains 11 sections, 6 figures.

Introduction
Related Works
Design Goals
System Design and Implementation
Overview
Architecture View
Refinement Comparison View
Human Evaluation
Procedure
Results and Design Lessons
Conclusion

Figures (6)

Figure 1: To learn how Stable Diffusion converts a text prompt into vector representations, users click the Text Representation Generator, which smoothly expands to (A) the Text Operation View, which explains how the prompt is split into tokens and encoded into vector representations. (B) The Text-image Linkage Explanation demonstrates how Stable Diffusion bridges text and image, enabling text representations to guide the image generation process.
Figure 2: Users learn how Stable Diffusion refines noise into a high-resolution image's vector representation aligned with the text prompt by clicking the Image Representation Refiner to smoothly expand to (A) the Image Operation View that demonstrates how noise is predicted and removed from the image representation. (B) The Interactive Guidance Explanation allows users to interactively experiment with different guidance scale values (0, 1, 7, 20) to better understand how higher values lead to stronger adherence.
Figure 3: The Refinement Comparison View enables users to discover the impacts of prompts on image generation by comparing how image representations evolve differently over refinement timesteps, using UMAP, when guided by two related text prompts. Adding "pixar" phrase changes the generated bunny's style to be more cartoony and vibrant in colors and textures while preserving its pose.
Figure 4: Diffusion Explainer more usable than blog post.
Figure 5: All features were rated highly.
...and 1 more figures

Diffusion Explainer: Visual Explanation for Text-to-image Stable Diffusion

TL;DR

Abstract

Diffusion Explainer: Visual Explanation for Text-to-image Stable Diffusion

Authors

TL;DR

Abstract

Table of Contents

Figures (6)