Table of Contents
Fetching ...

ArtWhisperer: A Dataset for Characterizing Human-AI Interactions in Artistic Creations

Kailas Vodrahalli, James Zou

TL;DR

Insight into human-AI interaction behavior is provided, a concrete method of assessing AI steerability is presented, and the general utility of the ArtWhisperer dataset is demonstrated.

Abstract

As generative AI becomes more prevalent, it is important to study how human users interact with such models. In this work, we investigate how people use text-to-image models to generate desired target images. To study this interaction, we created ArtWhisperer, an online game where users are given a target image and are tasked with iteratively finding a prompt that creates a similar-looking image as the target. Through this game, we recorded over 50,000 human-AI interactions; each interaction corresponds to one text prompt created by a user and the corresponding generated image. The majority of these are repeated interactions where a user iterates to find the best prompt for their target image, making this a unique sequential dataset for studying human-AI collaborations. In an initial analysis of this dataset, we identify several characteristics of prompt interactions and user strategies. People submit diverse prompts and are able to discover a variety of text descriptions that generate similar images. Interestingly, prompt diversity does not decrease as users find better prompts. We further propose a new metric to quantify the steerability of AI using our dataset. We define steerability as the expected number of interactions required to adequately complete a task. We estimate this value by fitting a Markov chain for each target task and calculating the expected time to reach an adequate score in the Markov chain. We quantify and compare AI steerability across different types of target images and two different models, finding that images of cities and natural world images are more steerable than artistic and fantasy images. These findings provide insights into human-AI interaction behavior, present a concrete method of assessing AI steerability, and demonstrate the general utility of the ArtWhisperer dataset.

ArtWhisperer: A Dataset for Characterizing Human-AI Interactions in Artistic Creations

TL;DR

Insight into human-AI interaction behavior is provided, a concrete method of assessing AI steerability is presented, and the general utility of the ArtWhisperer dataset is demonstrated.

Abstract

As generative AI becomes more prevalent, it is important to study how human users interact with such models. In this work, we investigate how people use text-to-image models to generate desired target images. To study this interaction, we created ArtWhisperer, an online game where users are given a target image and are tasked with iteratively finding a prompt that creates a similar-looking image as the target. Through this game, we recorded over 50,000 human-AI interactions; each interaction corresponds to one text prompt created by a user and the corresponding generated image. The majority of these are repeated interactions where a user iterates to find the best prompt for their target image, making this a unique sequential dataset for studying human-AI collaborations. In an initial analysis of this dataset, we identify several characteristics of prompt interactions and user strategies. People submit diverse prompts and are able to discover a variety of text descriptions that generate similar images. Interestingly, prompt diversity does not decrease as users find better prompts. We further propose a new metric to quantify the steerability of AI using our dataset. We define steerability as the expected number of interactions required to adequately complete a task. We estimate this value by fitting a Markov chain for each target task and calculating the expected time to reach an adequate score in the Markov chain. We quantify and compare AI steerability across different types of target images and two different models, finding that images of cities and natural world images are more steerable than artistic and fantasy images. These findings provide insights into human-AI interaction behavior, present a concrete method of assessing AI steerability, and demonstrate the general utility of the ArtWhisperer dataset.
Paper Structure (40 sections, 10 equations, 27 figures, 4 tables, 1 algorithm)

This paper contains 40 sections, 10 equations, 27 figures, 4 tables, 1 algorithm.

Figures (27)

  • Figure 1: Interface of the ArtWhisperer game. Prompts entered on right. Target (goal) image and player-generated image on left. Previous prompts and scores are displayed in the lower right.
  • Figure 2: Example user trajectories. In each row, we show (1) a given user's prompts, (2) the target image (rightmost image), and (3) a plot of this target image's average score trajectory across users (blue), this user's full score trajectory (red), and the displayed images (orange).
  • Figure 3: Left: Distribution of # of user queries per target image (average queries per image is 9.18). Right: Distribution of the # of words submitted in a query (average words submitted is 20.02 and 2.32 for positive and negative prompts respectively).
  • Figure 4: Diverse, high-scoring prompt submissions from different users. Target image in rightmost column.
  • Figure 5: Difference of distance from the first prompt to ground truth and distance from the last (best) prompt to ground truth for CLIP text (blue) and CLIP image embeddings (orange).
  • ...and 22 more figures