Table of Contents
Fetching ...

DirectGPT: A Direct Manipulation Interface to Interact with Large Language Models

Damien Masson, Sylvain Malacria, Géry Casiez, Daniel Vogel

TL;DR

DirectGPT demonstrates a direct manipulation layer on top of a large language model to address prompting inefficiencies. By continuously representing outputs, enabling object-centered prompt localization, reusing prompts as tools, and providing undo, it achieves faster task completion with fewer, shorter prompts while maintaining or improving accuracy and usability. A within-subject study with text, code, and vector-image edits shows ~50% faster performance, ~50% fewer prompts, and ~72% shorter prompts, with high usability. The work offers a generalizable blueprint for integrating prompt-driven AI into traditional software, balancing control and exploratory capabilities to empower co-creative workflows.

Abstract

We characterize and demonstrate how the principles of direct manipulation can improve interaction with large language models. This includes: continuous representation of generated objects of interest; reuse of prompt syntax in a toolbar of commands; manipulable outputs to compose or control the effect of prompts; and undo mechanisms. This idea is exemplified in DirectGPT, a user interface layer on top of ChatGPT that works by transforming direct manipulation actions to engineered prompts. A study shows participants were 50% faster and relied on 50% fewer and 72% shorter prompts to edit text, code, and vector images compared to baseline ChatGPT. Our work contributes a validated approach to integrate LLMs into traditional software using direct manipulation. Data, code, and demo available at https://osf.io/3wt6s.

DirectGPT: A Direct Manipulation Interface to Interact with Large Language Models

TL;DR

DirectGPT demonstrates a direct manipulation layer on top of a large language model to address prompting inefficiencies. By continuously representing outputs, enabling object-centered prompt localization, reusing prompts as tools, and providing undo, it achieves faster task completion with fewer, shorter prompts while maintaining or improving accuracy and usability. A within-subject study with text, code, and vector-image edits shows ~50% faster performance, ~50% fewer prompts, and ~72% shorter prompts, with high usability. The work offers a generalizable blueprint for integrating prompt-driven AI into traditional software, balancing control and exploratory capabilities to empower co-creative workflows.

Abstract

We characterize and demonstrate how the principles of direct manipulation can improve interaction with large language models. This includes: continuous representation of generated objects of interest; reuse of prompt syntax in a toolbar of commands; manipulable outputs to compose or control the effect of prompts; and undo mechanisms. This idea is exemplified in DirectGPT, a user interface layer on top of ChatGPT that works by transforming direct manipulation actions to engineered prompts. A study shows participants were 50% faster and relied on 50% fewer and 72% shorter prompts to edit text, code, and vector images compared to baseline ChatGPT. Our work contributes a validated approach to integrate LLMs into traditional software using direct manipulation. Data, code, and demo available at https://osf.io/3wt6s.
Paper Structure (45 sections, 6 figures)

This paper contains 45 sections, 6 figures.

Figures (6)

  • Figure 1: used to replace words with synonyms: (a) selecting an object such as a word before writing a prompt forces the prompt to apply to only this object, localizing its effect; (b) once executed, the modifications are highlighted and the "synonym" ad hoc tool is created; (c) the "synonym" tool is used to quickly find synonyms for other words.
  • Figure 2: used to finish drawing a flower: (a) draw a line by referring to specific locations through drag-and-drop; (b) the line is drawn, click to specify where to add the circle; (c) refer to another circle to copy its size; (d) the circle is added.
  • Figure 3: Prompts can be reused as tools: (a) a prompt with two nouns is executed; (b) the prompt is abstracted into an ad hoc tool; (c) using the tool, a click sets the first noun; (d) a second click sets the second noun and executes the reused prompt.
  • Figure 4: Given the (a) starting image, the participant completed four tasks in the image activity with ChatGPT and DirectGPT from either the top or bottom row: (b) colourize using gradients; (c) add elements; (d) remove elements; (e) flip upside down.
  • Figure 5: Participants' response when rating the 5-point statements for (a) and (b) ChatGPT. (c) Dots are the mean differences of DirectGPT compared to ChatGPT. Bars are the 95% CIs calculated with the studentized bootstrap method.
  • ...and 1 more figures