Table of Contents
Fetching ...

PrompTHis: Visualizing the Process and Influence of Prompt Editing during Text-to-Image Creation

Yuhan Guo, Hanning Shao, Can Liu, Kai Xu, Xiaoru Yuan

TL;DR

This work introduces Image Variant Graph and PrompTHis to visualize the prompt-editing process in text-to-image generation. By representing prompt differences as edges between image nodes and projecting images by visual similarity, the approach enables analysis of how word-level changes influence outputs, supports macro and micro understanding of model behavior, and aids creative planning. The system combines the main IVG visualization with a detailed prompt-image history, a navigation mini-map, and a creation panel, and is evaluated through a quantitative user study (n=11) and qualitative interviews (n=6) with professional and amateur users. Results show improved review, planning, and sensemaking of generative model behavior, with participants highlighting the usefulness of the edge representations for tracing word effects and guiding subsequent prompts.

Abstract

Generative text-to-image models, which allow users to create appealing images through a text prompt, have seen a dramatic increase in popularity in recent years. However, most users have a limited understanding of how such models work and it often requires many trials and errors to achieve satisfactory results. The prompt history contains a wealth of information that could provide users with insights into what have been explored and how the prompt changes impact the output image, yet little research attention has been paid to the visual analysis of such process to support users. We propose the Image Variant Graph, a novel visual representation designed to support comparing prompt-image pairs and exploring the editing history. The Image Variant Graph models prompt differences as edges between corresponding images and presents the distances between images through projection. Based on the graph, we developed the PrompTHis system through co-design with artists. Besides Image Variant Graph, PrompTHis also incorporates a detailed prompt-image history and a navigation mini-map. Based on the review and analysis of the prompting history, users can better understand the impact of prompt changes and have a more effective control of image generation. A quantitative user study with eleven amateur participants and qualitative interviews with five professionals and one amateur user were conducted to evaluate the effectiveness of PrompTHis. The results demonstrate PrompTHis can help users review the prompt history, make sense of the model, and plan their creative process.

PrompTHis: Visualizing the Process and Influence of Prompt Editing during Text-to-Image Creation

TL;DR

This work introduces Image Variant Graph and PrompTHis to visualize the prompt-editing process in text-to-image generation. By representing prompt differences as edges between image nodes and projecting images by visual similarity, the approach enables analysis of how word-level changes influence outputs, supports macro and micro understanding of model behavior, and aids creative planning. The system combines the main IVG visualization with a detailed prompt-image history, a navigation mini-map, and a creation panel, and is evaluated through a quantitative user study (n=11) and qualitative interviews (n=6) with professional and amateur users. Results show improved review, planning, and sensemaking of generative model behavior, with participants highlighting the usefulness of the edge representations for tracing word effects and guiding subsequent prompts.

Abstract

Generative text-to-image models, which allow users to create appealing images through a text prompt, have seen a dramatic increase in popularity in recent years. However, most users have a limited understanding of how such models work and it often requires many trials and errors to achieve satisfactory results. The prompt history contains a wealth of information that could provide users with insights into what have been explored and how the prompt changes impact the output image, yet little research attention has been paid to the visual analysis of such process to support users. We propose the Image Variant Graph, a novel visual representation designed to support comparing prompt-image pairs and exploring the editing history. The Image Variant Graph models prompt differences as edges between corresponding images and presents the distances between images through projection. Based on the graph, we developed the PrompTHis system through co-design with artists. Besides Image Variant Graph, PrompTHis also incorporates a detailed prompt-image history and a navigation mini-map. Based on the review and analysis of the prompting history, users can better understand the impact of prompt changes and have a more effective control of image generation. A quantitative user study with eleven amateur participants and qualitative interviews with five professionals and one amateur user were conducted to evaluate the effectiveness of PrompTHis. The results demonstrate PrompTHis can help users review the prompt history, make sense of the model, and plan their creative process.
Paper Structure (26 sections, 2 equations, 9 figures)

This paper contains 26 sections, 2 equations, 9 figures.

Figures (9)

  • Figure 1: In Image Variant Graph the nodes are the images and the edges are the difference between prompts, one edge for each difference. Weighting algorithms are then applied to filter out less important edges. Finally, a novel layout algorithm and visual encoding are used to enhance scalability.
  • Figure 2: An example showing that not all the word modifications have the same impact on the image: While "white" causes the color of the vase to change, "besides a computer" does not have an obvious impact.
  • Figure 3: Visual encoding of Image Variant Graph. Image relationships are indicated by bubbles and the word modifications are represented by glyphs.
  • Figure 4: Pipeline of edge derivation. The text pre-processing stage compares the prompts to identify the word modifications and derive the original set of edges. Image pre-processing involves embedding images based on text and image encoding, combining the embedding, and clustering images. Edges are then bundled based on the clusters. The impact of word modification on image change is calculated as edge weight, which is used to filter out low-impact edges.
  • Figure 5: Three-step comparison of two prompts to identify word-level modifications. First, the Myers comparison algorithm myers1986ano is applied to calculate the inserted and deleted words. Then, the changed words are matched to identify the reordered words. Finally, the weights of the matched words are compared.
  • ...and 4 more figures