Prompting Generative AI with Interaction-Augmented Instructions
Leixian Shen, Haotian Li, Yifang Wang, Xing Xie, Huamin Qu
TL;DR
This work tackles the ambiguity of natural-language prompts in GenAI interactions by proposing interaction-augmented instructions. It introduces a GenAI-based Instrumental Interaction Model and a 4W1H framework to analyze how, when, and by whom interactive cues are used to shape prompts and actions on domain objects. A corpus of 52 tools is analyzed to distill four design paradigms—restrict, expand, organize, and refine—that encode concrete interaction patterns. The work provides a structured basis for designing human-AI collaboration systems and identifies future research directions, including expanding the design space and examining human agency and implementation challenges.
Abstract
The emergence of generative AI (GenAI) models, including large language models and text-to-image models, has significantly advanced the synergy between humans and AI with not only their outstanding capability but more importantly, the intuitive communication method with text prompts. Though intuitive, text-based instructions suffer from natural languages' ambiguous and redundant nature. To address the issue, researchers have explored augmenting text-based instructions with interactions that facilitate precise and effective human intent expression, such as direct manipulation. However, the design strategy of interaction-augmented instructions lacks systematic investigation, hindering our understanding and application. To provide a panorama of interaction-augmented instructions, we propose a framework to analyze related tools from why, when, who, what, and how interactions are applied to augment text-based instructions. Notably, we identify four purposes for applying interactions, including restricting, expanding, organizing, and refining text instructions. The design paradigms for each purpose are also summarized to benefit future researchers and practitioners.
