Table of Contents
Fetching ...

Interaction-Augmented Instruction: Modeling the Synergy of Prompts and Interactions in Human-GenAI Collaboration

Leixian Shen, Yifang Wang, Huamin Qu, Xing Xie, Haotian Li

TL;DR

The paper tackles the challenge of conveying fine-grained human intent to GenAI systems by proposing the Interaction-Augmented Instruction (IAI) model, a compact six-entity entity–relation graph that combines text prompts, GUI interactions, artifacts, and augmented instructions into a single input for GenAI. It systematically derives twelve atomic paradigms from a corpus of GenAI-enabled interfaces, enabling descriptive, discriminative, and generative analysis of how prompts and interactions cooperate. Through four usage scenarios, the authors demonstrate how the IAI model guides design, enables pathway extension, and supports the creation of new interaction paradigms for evolving tasks such as multi-agent workflows and context-aware generation. The work establishes a formal framework for comparing, composing, and innovating interaction designs, with potential to shape more transparent, controllable, and capable GenAI-driven collaboration across domains.

Abstract

Text prompt is the most common way for human-generative AI (GenAI) communication. Though convenient, it is challenging to convey fine-grained and referential intent. One promising solution is to combine text prompts with precise GUI interactions, like brushing and clicking. However, there lacks a formal model to model synergistic designs between prompts and interactions, hindering their comparison and innovation. To fill this gap, via an iterative and deductive process, we develop the Interaction-Augmented Instruction (IAI) model, a compact entity-relation graph formalizing how the combination of interactions and text prompts enhances human-generative AI communication. With the model, we distill twelve recurring and composable atomic interaction paradigms from prior tools, verifying our model's capability to facilitate systematic design characterization and comparison. Case studies further demonstrate the model's utility in applying, refining, and extending these paradigms. These results illustrate our IAI model's descriptive, discriminative, and generative power for shaping future GenAI systems.

Interaction-Augmented Instruction: Modeling the Synergy of Prompts and Interactions in Human-GenAI Collaboration

TL;DR

The paper tackles the challenge of conveying fine-grained human intent to GenAI systems by proposing the Interaction-Augmented Instruction (IAI) model, a compact six-entity entity–relation graph that combines text prompts, GUI interactions, artifacts, and augmented instructions into a single input for GenAI. It systematically derives twelve atomic paradigms from a corpus of GenAI-enabled interfaces, enabling descriptive, discriminative, and generative analysis of how prompts and interactions cooperate. Through four usage scenarios, the authors demonstrate how the IAI model guides design, enables pathway extension, and supports the creation of new interaction paradigms for evolving tasks such as multi-agent workflows and context-aware generation. The work establishes a formal framework for comparing, composing, and innovating interaction designs, with potential to shape more transparent, controllable, and capable GenAI-driven collaboration across domains.

Abstract

Text prompt is the most common way for human-generative AI (GenAI) communication. Though convenient, it is challenging to convey fine-grained and referential intent. One promising solution is to combine text prompts with precise GUI interactions, like brushing and clicking. However, there lacks a formal model to model synergistic designs between prompts and interactions, hindering their comparison and innovation. To fill this gap, via an iterative and deductive process, we develop the Interaction-Augmented Instruction (IAI) model, a compact entity-relation graph formalizing how the combination of interactions and text prompts enhances human-generative AI communication. With the model, we distill twelve recurring and composable atomic interaction paradigms from prior tools, verifying our model's capability to facilitate systematic design characterization and comparison. Case studies further demonstrate the model's utility in applying, refining, and extending these paradigms. These results illustrate our IAI model's descriptive, discriminative, and generative power for shaping future GenAI systems.

Paper Structure

This paper contains 27 sections, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Interaction-Augmented Instruction Model.
  • Figure 2: Examples of pre-invocation paradigms, including prompt-only (P1-P3) and artifact-grounded (P4): (P1) Interactive prompt enhancement Almeda2024; (P2) Interactive prompt organization Zhu; (P3) Interaction as instruction Lin2025a; (P4) Artifact as instruction Masson2023b.
  • Figure 3: Examples of post-invocation, prompt-only paradigms: (P5) AI-driven prompt suggestion Angert2023; (P6) AI-driven prompt decomposition Cai2024; (P7) Generative prompt control widgets Wang2024g; (P8) Generative artifact control widgets Vaithilingam2024.
  • Figure 4: Examples of post-invocation, artifact-grounded paradigms: (P9) Artifact to organized instruction Zhang2023b; (P10) Artifact to multimodal instruction Singh2024; (P11) Artifact-driven prompt enhancement Chen2023e; (P12) Interactive artifact refinement Tang2024.
  • Figure 5: Usage Scenario 1: Extending Pipelines through Chained Paradigm Graphs. For example, DynaVis Vaithilingam2024 supports post-generation visualization refinement (P8). By chaining a pre-generation disambiguation paradigm (P5), the system can clarify ambiguous terms before execution, augmenting rather than replacing existing workflows.
  • ...and 3 more figures