Table of Contents
Fetching ...

PromptMap: An Alternative Interaction Style for AI-Based Image Generation

Krzysztof Adamkiewicz, Paweł W. Woźniak, Julia Dominiak, Andrzej Romanowski, Jakob Karolus, Stanislav Frolov

TL;DR

PromptMap introduces a map-based, semantic-zoom interface to explore a vast, synthetic collection of prompts for text-to-image generation, addressing novices' prompting difficulties. It combines a large-scale synthetic dataset (over 10 million prompts) with a 2D density map, labels, and search to support structured, inspiration-driven exploration, rather than language-only prompt crafting. Through a between-subject quantitative study and a within-subject qualitative study, the authors show that presenting examples shifts user strategy toward example-driven exploration and that synthetic data generation can yield diverse, high-quality prompts with lower NSFW content than scraped datasets. The work contributes a new interaction paradigm, a scalable data-generation pipeline, and open resources (dataset and code) that can influence future interface design and cross-modal prompting tasks.

Abstract

Recent technological advances popularized the use of image generation among the general public. Crafting effective prompts can, however, be difficult for novice users. To tackle this challenge, we developed PromptMap, a new interaction style for text-to-image AI that allows users to freely explore a vast collection of synthetic prompts through a map-like view with semantic zoom. PromptMap groups images visually by their semantic similarity, allowing users to discover relevant examples. We evaluated PromptMap in a between-subject online study ($n=60$) and a qualitative within-subject study ($n=12$). We found that PromptMap supported users in crafting prompts by providing them with examples. We also demonstrated the feasibility of using LLMs to create vast example collections. Our work contributes a new interaction style that supports users unfamiliar with prompting in achieving a satisfactory image output.

PromptMap: An Alternative Interaction Style for AI-Based Image Generation

TL;DR

PromptMap introduces a map-based, semantic-zoom interface to explore a vast, synthetic collection of prompts for text-to-image generation, addressing novices' prompting difficulties. It combines a large-scale synthetic dataset (over 10 million prompts) with a 2D density map, labels, and search to support structured, inspiration-driven exploration, rather than language-only prompt crafting. Through a between-subject quantitative study and a within-subject qualitative study, the authors show that presenting examples shifts user strategy toward example-driven exploration and that synthetic data generation can yield diverse, high-quality prompts with lower NSFW content than scraped datasets. The work contributes a new interaction paradigm, a scalable data-generation pipeline, and open resources (dataset and code) that can influence future interface design and cross-modal prompting tasks.

Abstract

Recent technological advances popularized the use of image generation among the general public. Crafting effective prompts can, however, be difficult for novice users. To tackle this challenge, we developed PromptMap, a new interaction style for text-to-image AI that allows users to freely explore a vast collection of synthetic prompts through a map-like view with semantic zoom. PromptMap groups images visually by their semantic similarity, allowing users to discover relevant examples. We evaluated PromptMap in a between-subject online study () and a qualitative within-subject study (). We found that PromptMap supported users in crafting prompts by providing them with examples. We also demonstrated the feasibility of using LLMs to create vast example collections. Our work contributes a new interaction style that supports users unfamiliar with prompting in achieving a satisfactory image output.

Paper Structure

This paper contains 35 sections, 8 figures.

Figures (8)

  • Figure 1: Map view shown in the PromptMap interface implements semantic zoom. As the user zooms in, more labels appear. At the highest levels of zoom, the density map fades out and is replaced by examples represented as individual points. The blue color indicates density. Samples with similar topics form clusters and are visible as darker blobs on the map.
  • Figure 2: PromptMap relies on the recursive expansion of concepts to generate a large dataset of prompts with an LLM. We initialize the generation with $160$ general categories describing images obtained from the GPT4-o model. The first three expansion stages create captions describing the ideas for images. We ask the LLM to find subcategories of general categories and then subcategories of those subcategories. Then, for each subcategory, we generate several ideas. We continue recursive expansion by prompting the LLM for $10$ location captions per idea, and then for each location and parent idea; we prompt for $5$ main subject captions. Finally, we prompt LLM to merge the subject and its parent location and idea into a prompt. This process generates $12.3$M prompts from the initial set of categories. For clarity, deduplication was omitted from this plot, and only two outputs are shown for each input.
  • Figure 3: We plot the number of unique subjects obtained through annotation of images against the number of examples in a random $50$k image sample. We compare our dataset against DiffusionDB wang2023diffusiondb, SynthCI-50M hammoud2024synthclip, and directly prompting the model to generate image captions. Our dataset demonstrates a significantly larger number of unique subjects than tested baselines.
  • Figure 4: We compare the distribution of prompt lengths between our synthetic dataset and human written prompts from DiffusionDB wang2023diffusiondb. We find that our prompts (M=$17.2$, SD=$3.7$) are, on average, more concise and more consistent in length than prompts in DiffusionDB (M=$38.0$, SD=$22.3$).
  • Figure 5: During user studies, participants used the UI shown in \ref{['fig:teaser']} and two ablated versions of it. The search bar was removed in the No Support condition, and the tab with generated images replaced the map view. In Nearest Neighbor condition, users could search DiffusionDB wang2023diffusiondb for image examples. The search results were displayed as a grid view that replaced the map. Finally, in PromptMap condition, participants used the full version of the interface and could explore the dataset using the map view and search feature.
  • ...and 3 more figures