Table of Contents
Fetching ...

Prompting Generative AI with Interaction-Augmented Instructions

Leixian Shen, Haotian Li, Yifang Wang, Xing Xie, Huamin Qu

TL;DR

This work tackles the ambiguity of natural-language prompts in GenAI interactions by proposing interaction-augmented instructions. It introduces a GenAI-based Instrumental Interaction Model and a 4W1H framework to analyze how, when, and by whom interactive cues are used to shape prompts and actions on domain objects. A corpus of 52 tools is analyzed to distill four design paradigms—restrict, expand, organize, and refine—that encode concrete interaction patterns. The work provides a structured basis for designing human-AI collaboration systems and identifies future research directions, including expanding the design space and examining human agency and implementation challenges.

Abstract

The emergence of generative AI (GenAI) models, including large language models and text-to-image models, has significantly advanced the synergy between humans and AI with not only their outstanding capability but more importantly, the intuitive communication method with text prompts. Though intuitive, text-based instructions suffer from natural languages' ambiguous and redundant nature. To address the issue, researchers have explored augmenting text-based instructions with interactions that facilitate precise and effective human intent expression, such as direct manipulation. However, the design strategy of interaction-augmented instructions lacks systematic investigation, hindering our understanding and application. To provide a panorama of interaction-augmented instructions, we propose a framework to analyze related tools from why, when, who, what, and how interactions are applied to augment text-based instructions. Notably, we identify four purposes for applying interactions, including restricting, expanding, organizing, and refining text instructions. The design paradigms for each purpose are also summarized to benefit future researchers and practitioners.

Prompting Generative AI with Interaction-Augmented Instructions

TL;DR

This work tackles the ambiguity of natural-language prompts in GenAI interactions by proposing interaction-augmented instructions. It introduces a GenAI-based Instrumental Interaction Model and a 4W1H framework to analyze how, when, and by whom interactive cues are used to shape prompts and actions on domain objects. A corpus of 52 tools is analyzed to distill four design paradigms—restrict, expand, organize, and refine—that encode concrete interaction patterns. The work provides a structured basis for designing human-AI collaboration systems and identifies future research directions, including expanding the design space and examining human agency and implementation challenges.

Abstract

The emergence of generative AI (GenAI) models, including large language models and text-to-image models, has significantly advanced the synergy between humans and AI with not only their outstanding capability but more importantly, the intuitive communication method with text prompts. Though intuitive, text-based instructions suffer from natural languages' ambiguous and redundant nature. To address the issue, researchers have explored augmenting text-based instructions with interactions that facilitate precise and effective human intent expression, such as direct manipulation. However, the design strategy of interaction-augmented instructions lacks systematic investigation, hindering our understanding and application. To provide a panorama of interaction-augmented instructions, we propose a framework to analyze related tools from why, when, who, what, and how interactions are applied to augment text-based instructions. Notably, we identify four purposes for applying interactions, including restricting, expanding, organizing, and refining text instructions. The design paradigms for each purpose are also summarized to benefit future researchers and practitioners.

Paper Structure

This paper contains 11 sections, 3 figures, 1 table.

Figures (3)

  • Figure 1: GenAI-based instrumental interaction model.
  • Figure 2: The 4W1H analytical framework of existing GenAI tools featuring interaction-augmented instructions. The "Why" dimension describes four impact types of interactions on instructions.
  • Figure 3: Examples of interaction-augmented instructions, displayed by four impact types of interaction on instructions ("Why") and four sub-categories based on "When" and "Who" in Sec. \ref{['sec:pattern']}. In each example, denotes the core interaction design, while indicates the absence of typical paradigm instances identified in our paper, leaving for future exploration. The examples are from (a1) Masson2023b, (a3) canvas, (a4) Laban2023, (b1) dataplaywright, (b2) Wu2023c, (b3) Liu2024b, (b4) Zhang2023b, (c1) Zhu, (d1) Wang2024g, and (d2) Angert2023.