Exploring Visual Prompts: Refining Images with Scribbles and Annotations in Generative AI Image Tools
Hyerim Park, Malin Eiband, Andre Luckow, Michael Sedlmair
TL;DR
This work investigates how professional designers refine GenAI-generated images using three input modalities—text prompts, annotations, and scribbles—beyond traditional text-centric prompts. Through a preliminary digital paper-based study with seven designers, the authors compare how these inputs support refinement tasks across six categories, revealing that annotations excel for spatial and in-image referencing, scribbles help specify attributes, and text prompts handle detailed descriptions and global changes, with each method having limitations such as misinterpretation or high prompting effort. The findings show designers often mix inputs and that refinement tasks benefit from multimodal interfaces that align with real workflows, suggesting future designs should support smooth transitions among methods and account for varied work setups. Practically, this work informs the design of GenAI interfaces capable of better communicating intent, balancing user control with AI creativity, and enhancing precision during refinement stages in design pipelines. Future research directions include dynamic adaptation of input methods, improved interpretation of annotations, and exploring whether tools should adapt to users or vice versa to optimize refinement tasks.
Abstract
Generative AI (GenAI) tools are increasingly integrated into design workflows. While text prompts remain the primary input method for GenAI image tools, designers often struggle to craft effective ones. Moreover, research has primarily focused on input methods for ideation, with limited attention to refinement tasks. This study explores designers' preferences for three input methods - text prompts, annotations, and scribbles - through a preliminary digital paper-based study with seven professional designers. Designers preferred annotations for spatial adjustments and referencing in-image elements, while scribbles were favored for specifying attributes such as shape, size, and position, often combined with other methods. Text prompts excelled at providing detailed descriptions or when designers sought greater GenAI creativity. However, designers expressed concerns about AI misinterpreting annotations and scribbles and the effort needed to create effective text prompts. These insights inform GenAI interface design to better support refinement tasks, align with workflows, and enhance communication with AI systems.
