Table of Contents
Fetching ...

AI-Instruments: Embodying Prompts as Instruments to Abstract & Reflect Graphical Interface Commands as General-Purpose Tools

Nathalie Riche, Anna Offenwanger, Frederic Gmeiner, David Brown, Hugo Romat, Michel Pahud, Nicolai Marquardt, Kori Inkpen, Ken Hinckley

TL;DR

The paper tackles the challenge that chat-based prompts produce linear, hard-to-refine interactions for creative design with generative AI. It extends the instrumental interaction model by introducing AI-Instruments—reification of user intent, reflection, and grounding—and demonstrates four technology probes for image generation. Through a qualitative study with 12 participants, it shows that these instruments support non-linear exploration, direct manipulation, and richer intent formulation and resolution than traditional prompting. The work contributes a general interaction framework, four concrete instruments (Fragments, Transformative Lenses, Generative Containers, Fillable Brushes), and the notion of meta-instruments (Palettes) to organize complex instrument collections, with implications for broader AI-enabled creative workflows.

Abstract

Chat-based prompts respond with verbose linear-sequential texts, making it difficult to explore and refine ambiguous intents, back up and reinterpret, or shift directions in creative AI-assisted design work. AI-Instruments instead embody "prompts" as interface objects via three key principles: (1) Reification of user-intent as reusable direct-manipulation instruments; (2) Reflection of multiple interpretations of ambiguous user-intents (Reflection-in-intent) as well as the range of AI-model responses (Reflection-in-response) to inform design "moves" towards a desired result; and (3) Grounding to instantiate an instrument from an example, result, or extrapolation directly from another instrument. Further, AI-Instruments leverage LLM's to suggest, vary, and refine new instruments, enabling a system that goes beyond hard-coded functionality by generating its own instrumental controls from content. We demonstrate four technology probes, applied to image generation, and qualitative insights from twelve participants, showing how AI-Instruments address challenges of intent formulation, steering via direct manipulation, and non-linear iterative workflows to reflect and resolve ambiguous intents.

AI-Instruments: Embodying Prompts as Instruments to Abstract & Reflect Graphical Interface Commands as General-Purpose Tools

TL;DR

The paper tackles the challenge that chat-based prompts produce linear, hard-to-refine interactions for creative design with generative AI. It extends the instrumental interaction model by introducing AI-Instruments—reification of user intent, reflection, and grounding—and demonstrates four technology probes for image generation. Through a qualitative study with 12 participants, it shows that these instruments support non-linear exploration, direct manipulation, and richer intent formulation and resolution than traditional prompting. The work contributes a general interaction framework, four concrete instruments (Fragments, Transformative Lenses, Generative Containers, Fillable Brushes), and the notion of meta-instruments (Palettes) to organize complex instrument collections, with implications for broader AI-enabled creative workflows.

Abstract

Chat-based prompts respond with verbose linear-sequential texts, making it difficult to explore and refine ambiguous intents, back up and reinterpret, or shift directions in creative AI-assisted design work. AI-Instruments instead embody "prompts" as interface objects via three key principles: (1) Reification of user-intent as reusable direct-manipulation instruments; (2) Reflection of multiple interpretations of ambiguous user-intents (Reflection-in-intent) as well as the range of AI-model responses (Reflection-in-response) to inform design "moves" towards a desired result; and (3) Grounding to instantiate an instrument from an example, result, or extrapolation directly from another instrument. Further, AI-Instruments leverage LLM's to suggest, vary, and refine new instruments, enabling a system that goes beyond hard-coded functionality by generating its own instrumental controls from content. We demonstrate four technology probes, applied to image generation, and qualitative insights from twelve participants, showing how AI-Instruments address challenges of intent formulation, steering via direct manipulation, and non-linear iterative workflows to reflect and resolve ambiguous intents.

Paper Structure

This paper contains 39 sections, 9 figures.

Figures (9)

  • Figure 1: Sequence of interactions to explore ideas with generative containers and lens probes: When dragging an image into a container (1), variations are created based on style (2). When selecting one of these images and dragging it into another container with the prompt "different types of bird", variations of different kinds of birds are generated in a consistent art style (3). A transformative lens around one of the earlier images generates a landscape around the bird through inpainting (4), and allows more complex composition of content (5, 6).
  • Figure 2: Sequence of interactions to steer image generation with fragments and brushes probes: Prompt fragments are generated for an existing image and show dimensions of the image to manipulate (1). A person can modify any of these fragments and a new image is generated (2). Containers can generate variations of fragments, which are then used to modify the image (3). Fillable Brushes (pen-like instruments) are used to modify the image of a castle, changing the art rendering style and color where the brush painted over the image, based on the prompt that was 'filled' into the pen (4, 5).
  • Figure 3: In the chat-based interaction model, interactions consists of a linear sequence of input+output pairs and steering is done by modifying the input (1). Reification enables articulating interactions into phrases for example by reusing the output of the prior input (2). It also affords direct manipulation techniques such as for lasso selection (in red) to specify the scope of the input (3). Reification of user intent enables users to reflect on their intent and navigate dimensions such as its degree of abstraction, using other instruments to make it more concrete (4) or abstract (5) for example.
  • Figure 4: Reflection-in-intent enables users to gain awareness of the possible formulations of their intent while reflection-in-response enables users to assess the space of possibilities of the outputs generated by the model given an input. These aspects may help users address the challenges of intent disambiguation, resolution and steering.
  • Figure 5: Grounding an instrument such as a generative container with an example enables to refer to features to preserve or alter in simple worlds by leveraging AI segmentation (1). Grounding an instrument such as a fillable brush in a specific aspect of an example, for example by selecting a region and extracting its style (2), enables users to use and apply it to other inputs without the need to articulating it in words. The principle of grounding also applies to instruments themselves such as deriving fragments from an example one (3).
  • ...and 4 more figures