DirectGPT: A Direct Manipulation Interface to Interact with Large Language Models
Damien Masson, Sylvain Malacria, Géry Casiez, Daniel Vogel
TL;DR
DirectGPT demonstrates a direct manipulation layer on top of a large language model to address prompting inefficiencies. By continuously representing outputs, enabling object-centered prompt localization, reusing prompts as tools, and providing undo, it achieves faster task completion with fewer, shorter prompts while maintaining or improving accuracy and usability. A within-subject study with text, code, and vector-image edits shows ~50% faster performance, ~50% fewer prompts, and ~72% shorter prompts, with high usability. The work offers a generalizable blueprint for integrating prompt-driven AI into traditional software, balancing control and exploratory capabilities to empower co-creative workflows.
Abstract
We characterize and demonstrate how the principles of direct manipulation can improve interaction with large language models. This includes: continuous representation of generated objects of interest; reuse of prompt syntax in a toolbar of commands; manipulable outputs to compose or control the effect of prompts; and undo mechanisms. This idea is exemplified in DirectGPT, a user interface layer on top of ChatGPT that works by transforming direct manipulation actions to engineered prompts. A study shows participants were 50% faster and relied on 50% fewer and 72% shorter prompts to edit text, code, and vector images compared to baseline ChatGPT. Our work contributes a validated approach to integrate LLMs into traditional software using direct manipulation. Data, code, and demo available at https://osf.io/3wt6s.
