Table of Contents
Fetching ...

PromptMap: Supporting Exploratory Text-to-Image Generation

Yuhan Guo, Xingyou Liu, Xiaoru Yuan, Kai Xu

TL;DR

PromptMap addresses the challenge of disorientation in exploratory text-to-image generation by introducing the Design-Exploration model and a node-link visualization that captures nonlinear thinking and structured subspaces. The system represents exploration as a dynamic interplay between prompts (designs) and subspaces (structured dimensions) using recursive grids, enabling easy history review, comparison across dimensions, and curated reuse of results. Through eight qualitative interviews, PromptMap is shown to support organized, divergent exploration and sensemaking, with participants valuing the explicit representation of thinking paths and the ability to manage large design spaces. This approach offers practical benefits for artists and designers by reducing cognitive load and providing a scalable, flexible workflow for open-ended creative exploration, with potential for image-to-image integration and personalized recommendations in future work.

Abstract

Text-to-image generative models can be tremendously valuable in supporting creative tasks by providing inspirations and enabling quick exploration of different design ideas. However, one common challenge is that users may still not be able to find anything useful after many hours and hundreds of images. Without effective help, users can easily get lost in the vast design space, forgetting what has been tried and what has not. In this work, we first propose the Design-Exploration model to formalize the exploration process. Based on this model, we create an interactive visualization system, PromptMap, to support exploratory text-to-image generation. Our system provides a new visual representation that better matches the non-linear nature of such processes, making them easier to understand and follow. It utilizes novel visual representations and intuitive interactions to help users structure the many possibilities that they can explore. We evaluated the system through in-depth interviews with users.

PromptMap: Supporting Exploratory Text-to-Image Generation

TL;DR

PromptMap addresses the challenge of disorientation in exploratory text-to-image generation by introducing the Design-Exploration model and a node-link visualization that captures nonlinear thinking and structured subspaces. The system represents exploration as a dynamic interplay between prompts (designs) and subspaces (structured dimensions) using recursive grids, enabling easy history review, comparison across dimensions, and curated reuse of results. Through eight qualitative interviews, PromptMap is shown to support organized, divergent exploration and sensemaking, with participants valuing the explicit representation of thinking paths and the ability to manage large design spaces. This approach offers practical benefits for artists and designers by reducing cognitive load and providing a scalable, flexible workflow for open-ended creative exploration, with potential for image-to-image integration and personalized recommendations in future work.

Abstract

Text-to-image generative models can be tremendously valuable in supporting creative tasks by providing inspirations and enabling quick exploration of different design ideas. However, one common challenge is that users may still not be able to find anything useful after many hours and hundreds of images. Without effective help, users can easily get lost in the vast design space, forgetting what has been tried and what has not. In this work, we first propose the Design-Exploration model to formalize the exploration process. Based on this model, we create an interactive visualization system, PromptMap, to support exploratory text-to-image generation. Our system provides a new visual representation that better matches the non-linear nature of such processes, making them easier to understand and follow. It utilizes novel visual representations and intuitive interactions to help users structure the many possibilities that they can explore. We evaluated the system through in-depth interviews with users.

Paper Structure

This paper contains 31 sections, 6 figures.

Figures (6)

  • Figure 1: Design-exploration model (adapted from the data-frame model klein2007dataframe).
  • Figure 2: Three forms of the prompt node. (a) Prompt form, which includes the prompt, parameters, and generated images. (b) Input form for changing the prompt and parameters. This is used to fork an existing node with duplicated prompt and settings to start with. (c) Image form containing a single image that the user is particularly interested in, which is created by dragging an image out of the prompt node (or the subspace node) for further exploration.
  • Figure 3: The subspace node can be created by adding dimension to a prompt. This is achieved by selecting some text and setting it as a dimension in the context menu. Multiple dimensions can be added to a prompt. Subspace node is a more compact representation of a collection of related prompt nodes: each addition of a dimension or its value can be alternatively represented as a child prompt node. Subspace node supports the interaction of expanding into a series of corresponding prompt nodes.
  • Figure 4: The grid view applies dimensional stackingward1994xmdvtool to visualize the subspace. The space is divided recursively (nested grids) to accommodate more than two dimensions. In the top-left grid, the color green (subject) and orange (style) are the first two dimensions, mapped to the x and y axis respectively. The third dimension, which is "scene" shown in blue, is shown by further dividing each grid cell. The cell in the grid (top left) can be dragged out as a child node (bottom left) to explore even more dimensions (bottom right). Users can either click the drop-down list embedded in the prompt or open the setting panel (top right) to edit the values.
  • Figure 5: User curation: (a) Like images have a heart icon and disliked images have reduced opacity; (b) Pinned node; (c) Minimized node. In the mini-map (d), the degree of like/dislike for a node is encoded with color (blue for "like" and orange for "dislike") and a location icon is added to each pinned node.
  • ...and 1 more figures