Table of Contents
Fetching ...

Exploring Visual Prompts: Refining Images with Scribbles and Annotations in Generative AI Image Tools

Hyerim Park, Malin Eiband, Andre Luckow, Michael Sedlmair

TL;DR

This work investigates how professional designers refine GenAI-generated images using three input modalities—text prompts, annotations, and scribbles—beyond traditional text-centric prompts. Through a preliminary digital paper-based study with seven designers, the authors compare how these inputs support refinement tasks across six categories, revealing that annotations excel for spatial and in-image referencing, scribbles help specify attributes, and text prompts handle detailed descriptions and global changes, with each method having limitations such as misinterpretation or high prompting effort. The findings show designers often mix inputs and that refinement tasks benefit from multimodal interfaces that align with real workflows, suggesting future designs should support smooth transitions among methods and account for varied work setups. Practically, this work informs the design of GenAI interfaces capable of better communicating intent, balancing user control with AI creativity, and enhancing precision during refinement stages in design pipelines. Future research directions include dynamic adaptation of input methods, improved interpretation of annotations, and exploring whether tools should adapt to users or vice versa to optimize refinement tasks.

Abstract

Generative AI (GenAI) tools are increasingly integrated into design workflows. While text prompts remain the primary input method for GenAI image tools, designers often struggle to craft effective ones. Moreover, research has primarily focused on input methods for ideation, with limited attention to refinement tasks. This study explores designers' preferences for three input methods - text prompts, annotations, and scribbles - through a preliminary digital paper-based study with seven professional designers. Designers preferred annotations for spatial adjustments and referencing in-image elements, while scribbles were favored for specifying attributes such as shape, size, and position, often combined with other methods. Text prompts excelled at providing detailed descriptions or when designers sought greater GenAI creativity. However, designers expressed concerns about AI misinterpreting annotations and scribbles and the effort needed to create effective text prompts. These insights inform GenAI interface design to better support refinement tasks, align with workflows, and enhance communication with AI systems.

Exploring Visual Prompts: Refining Images with Scribbles and Annotations in Generative AI Image Tools

TL;DR

This work investigates how professional designers refine GenAI-generated images using three input modalities—text prompts, annotations, and scribbles—beyond traditional text-centric prompts. Through a preliminary digital paper-based study with seven designers, the authors compare how these inputs support refinement tasks across six categories, revealing that annotations excel for spatial and in-image referencing, scribbles help specify attributes, and text prompts handle detailed descriptions and global changes, with each method having limitations such as misinterpretation or high prompting effort. The findings show designers often mix inputs and that refinement tasks benefit from multimodal interfaces that align with real workflows, suggesting future designs should support smooth transitions among methods and account for varied work setups. Practically, this work informs the design of GenAI interfaces capable of better communicating intent, balancing user control with AI creativity, and enhancing precision during refinement stages in design pipelines. Future research directions include dynamic adaptation of input methods, improved interpretation of annotations, and exploring whether tools should adapt to users or vice versa to optimize refinement tasks.

Abstract

Generative AI (GenAI) tools are increasingly integrated into design workflows. While text prompts remain the primary input method for GenAI image tools, designers often struggle to craft effective ones. Moreover, research has primarily focused on input methods for ideation, with limited attention to refinement tasks. This study explores designers' preferences for three input methods - text prompts, annotations, and scribbles - through a preliminary digital paper-based study with seven professional designers. Designers preferred annotations for spatial adjustments and referencing in-image elements, while scribbles were favored for specifying attributes such as shape, size, and position, often combined with other methods. Text prompts excelled at providing detailed descriptions or when designers sought greater GenAI creativity. However, designers expressed concerns about AI misinterpreting annotations and scribbles and the effort needed to create effective text prompts. These insights inform GenAI interface design to better support refinement tasks, align with workflows, and enhance communication with AI systems.

Paper Structure

This paper contains 34 sections, 4 figures, 1 table.

Figures (4)

  • Figure 1: Illustration of the Three Input Methods Tested in the Study: (from left to right) text prompts (typed instructions requiring inpainting for area selection), annotations (text or visual symbols on the image, with optional inpainting selected by the user), and scribbles (freeform scribbles on the image, with optional inpainting selected by the user).
  • Figure 2: Text Prompt Input Examples: (1) and (2) involve inpainting (highlighted in blue) applied to large areas for background changes, such as adding a forest ("Add a forest as a background outside the car") or increasing snowfall ("Make the snowfall bigger and heavier"). (3) A participant used inpainting to modify a man into a girlfriend figure with the text prompt, "Change this man into a woman with a girlfriend look." The girlfriend figure allowed room for AI's creative interpretation, as the prompt required less precise instructions. (4) A participant marked an area and provided the text prompt, "Add a Christmas tree decorated with ornaments and lighting. The decoration should be modern and chic, with yellow and white lights. It should match the building behind." The instructions were relatively lengthy and detailed. Some participants relied solely on text prompts without inpainting to adjust the overall mood of the image, such as removing the yellow tone entirely.
  • Figure 3: Annotation Input Examples: Annotations involve text or visual symbols (e.g., circles, arrows, lines, numbering), with arrows and circles being the most commonly used. At times, annotations were combined with scribbles and text prompts. (1) A participant used scribbles to indicate the size of a logo on the steering wheel or buttons on the display, paired with text annotations like "bigger" or "more padding." (2) Scribbles indicated snow distribution on mountains, clarified by a text annotation reading "Add snow." (3) An arrow and text annotation ("Switch to this pattern") were used to apply a checkered pattern from one person’s clothing to another’s in the image. (4) A participant used an arrow and dot to indicate a change in eye direction, with a text prompt stating "Change the eyes to look in the direction of the dots." (5) Numbering (e.g., "1" and "2") labeled two beer glasses, with a text prompt stating, "Make 1 and 2 the same size." (6) Visual symbols, including an arrow and a circle, showed how to move a plant to a new position in the image.
  • Figure 4: Scribble Input Examples: Most users combined scribbles with text prompts or annotations to add details. (1) A scribble depicted the shape, size, and rough design of a flying car station to be added. (2) A participant used scribbles to specify the intended size of a cake, a balloon, and a new element (a cup) on a desk without additional input. (3) Scribbles highlighted specific regions of a car for lighting effect adjustments, paired with annotations specifying areas to brighten (in blue) and darken (in green). (4) A participant used rough scribbles to indicate the size and orientation of a dachshund, complemented by an annotation labeling it as "dachshund." (5) Scribbles showed the desired position of a ribbon and modifications to a skirt shape, with an annotation specifying "ribbon." (6) A scribble outlined a beanie design without a pompom, while annotations and text prompts detailed adding a “Tottenham logo” to the pink-circled area.