DirectGPT: A Direct Manipulation Interface to Interact with Large Language Models

Damien Masson; Sylvain Malacria; Géry Casiez; Daniel Vogel

DirectGPT: A Direct Manipulation Interface to Interact with Large Language Models

Damien Masson, Sylvain Malacria, Géry Casiez, Daniel Vogel

TL;DR

DirectGPT demonstrates a direct manipulation layer on top of a large language model to address prompting inefficiencies. By continuously representing outputs, enabling object-centered prompt localization, reusing prompts as tools, and providing undo, it achieves faster task completion with fewer, shorter prompts while maintaining or improving accuracy and usability. A within-subject study with text, code, and vector-image edits shows ~50% faster performance, ~50% fewer prompts, and ~72% shorter prompts, with high usability. The work offers a generalizable blueprint for integrating prompt-driven AI into traditional software, balancing control and exploratory capabilities to empower co-creative workflows.

Abstract

We characterize and demonstrate how the principles of direct manipulation can improve interaction with large language models. This includes: continuous representation of generated objects of interest; reuse of prompt syntax in a toolbar of commands; manipulable outputs to compose or control the effect of prompts; and undo mechanisms. This idea is exemplified in DirectGPT, a user interface layer on top of ChatGPT that works by transforming direct manipulation actions to engineered prompts. A study shows participants were 50% faster and relied on 50% fewer and 72% shorter prompts to edit text, code, and vector images compared to baseline ChatGPT. Our work contributes a validated approach to integrate LLMs into traditional software using direct manipulation. Data, code, and demo available at https://osf.io/3wt6s.

DirectGPT: A Direct Manipulation Interface to Interact with Large Language Models

TL;DR

Abstract

Paper Structure (45 sections, 6 figures)

This paper contains 45 sections, 6 figures.

Introduction
Background and Related Work
What is Direct Manipulation?
Five Issues With Prompting that Motivate the use of Direct Manipulation
Systems to Help Craft Better Prompts
Prompting through More Direct Interactions
Labelled Buttons Instead of Verbose Prompts
Physical Actions Instead of Verbose Prompts
Reversible Operations Instead of Continuous Conversations
Immediate Feedback Instead of Word-by-Word Generation
Blending Language and Direct Manipulation
: an exemplar direct interface for LLMs
Example Use Case
Direct Manipulation Principles for LLMs
Continuous Representation of the Last Output
...and 30 more sections

Figures (6)

Figure 1: used to replace words with synonyms: (a) selecting an object such as a word before writing a prompt forces the prompt to apply to only this object, localizing its effect; (b) once executed, the modifications are highlighted and the "synonym" ad hoc tool is created; (c) the "synonym" tool is used to quickly find synonyms for other words.
Figure 2: used to finish drawing a flower: (a) draw a line by referring to specific locations through drag-and-drop; (b) the line is drawn, click to specify where to add the circle; (c) refer to another circle to copy its size; (d) the circle is added.
Figure 3: Prompts can be reused as tools: (a) a prompt with two nouns is executed; (b) the prompt is abstracted into an ad hoc tool; (c) using the tool, a click sets the first noun; (d) a second click sets the second noun and executes the reused prompt.
Figure 4: Given the (a) starting image, the participant completed four tasks in the image activity with ChatGPT and DirectGPT from either the top or bottom row: (b) colourize using gradients; (c) add elements; (d) remove elements; (e) flip upside down.
Figure 5: Participants' response when rating the 5-point statements for (a) and (b) ChatGPT. (c) Dots are the mean differences of DirectGPT compared to ChatGPT. Bars are the 95% CIs calculated with the studentized bootstrap method.
...and 1 more figures

DirectGPT: A Direct Manipulation Interface to Interact with Large Language Models

TL;DR

Abstract

DirectGPT: A Direct Manipulation Interface to Interact with Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (6)