DrawTalking: Building Interactive Worlds by Sketching and Speaking

Karl Toby Rosenberg; Rubaiat Habib Kazi; Li-Yi Wei; Haijun Xia; Ken Perlin

DrawTalking: Building Interactive Worlds by Sketching and Speaking

Karl Toby Rosenberg, Rubaiat Habib Kazi, Li-Yi Wei, Haijun Xia, Ken Perlin

TL;DR

DrawTalking presents a sketch-and-speech interface that enables open-ended creation and control of interactive worlds without coding. By coupling freehand sketching with narrative language, a transcript, and a semantics diagram, the system provides programming-like expressiveness while preserving user agency and transparency. An iPad prototype demonstrates diverse demonstrations (ponds, games, windmills, day/night cycles) and a formative study reveals strong value in semantics-visual alignment, fluid storytelling, and rapid prototyping, along with limitations in natural-language understanding. The work highlights a future pathway for human-centered creative interfaces that blend narration with direct manipulation, and suggests integration with other tools, deeper semantic mappings, and more natural language processing as avenues for advancement.

Abstract

We introduce DrawTalking, an approach to building and controlling interactive worlds by sketching and speaking while telling stories. It emphasizes user control and flexibility, and gives programming-like capability without requiring code. An early open-ended study with our prototype shows that the mechanics resonate and are applicable to many creative-exploratory use cases, with the potential to inspire and inform research in future natural interfaces for creative exploration and authoring.

DrawTalking: Building Interactive Worlds by Sketching and Speaking

TL;DR

Abstract

Paper Structure (55 sections, 26 figures)

This paper contains 55 sections, 26 figures.

Introduction
Related Work
Natural Language-Adjacent Interfaces
Dynamic Sketching Interfaces
Programming-Like Interfaces
Formative Steps
Methodology
Results and Observations
Association between spoken language, text, and drawing:
Temporal synchronization between speech and drawing:
Drawn versus non-drawn spoken content:
Speed and flow:
Content modification:
Hierarchical object model:
Design Goals
...and 40 more sections

Figures (26)

Figure 1: Interface Overview: An interface screenshot (from P4 in \ref{['sec:open_ended_user_study']}). The toolbar (left) enables edit operations, e.g. copy, delete, attach/detach, save sketch. The transcript view displays the user's speech input in an interactive panel, and the semantic diagram displays the machine's understanding of the input. "Speech controls" stage/confirm an action, discard input, and toggle speech recognition. "Transcript controls" offer quick transcript text selection. The status bar displays the current color, a compass pointing "up," and has the pen/eraser state-change buttons and indicators. The scene shown is just before the user confirms a command for the utterance "The character jumps on the platforms"; it selects the sketch labeled "character" and all sketches labeled "platform." Tapping the "language action" button (top-right) will stage the command and display the semantic diagram (top-left) representing the machine's understanding of the input with selected objects displayed under their respective words; the user confirms the command by tapping again. This causes the character to jump on all of the platforms. For details on the workflow and commands, see \ref{['sec:user_interface_elements']} and \ref{['sec:language_commands_and_functionality']}.
Figure 2: Overall workflow: Dog and boy's infinite game of fetch. a) From left to right, the user draws and labels them using multiple approaches at different stages of drawing. a.1) The user is midway through drawing the boy, but can label it using "This is a boy" as the pen is interacting with the object. a.2) The ball is already drawn, but unlabeled; during the same sentence the user can quickly tap the ball to label it. a.3) The user draws some water and uses free-form speech to say "water" without deixis (such as "this"/"that"). The user can select the object with touch and simultaneously tap the word "water" in the transcript label the object with the word in the transcript. a.4) Touch+pen on a word will remove the label. Adjectives work in the same way. b)Left: Labeled sketches are commanded. Right: Interactive user-participation. The user can move objects as the system is simulating their movements, which will dynamically adjust as the user plays-around spontaneously.
Figure 3: Semantics Diagram: An example workflow of error-correction (changing the target "building" from red house to yellow skyscraper). (a) spoken user input in the transcript panel. (b) The semantics diagram generated from the user's input to visualize system's interpretation. (c-d) Pen+touch interaction between diagram nouns and objects in the scene (including unlabeled ones) allows for re-linking to modify or correct the machine's selections. (e) Confirm language command. If a verb is unknown for a command, the user can pick from an auto-presented list of similar verbs, or otherwise cancel (\ref{['fig:verb-substitution']}). In our implementation, if a sentence produces too long a diagram to fit in-view, the user can zoom-out independently from the canvas.
Figure 4: Transcript View: Selection / Deselection of Words: Toggle off/on words via touch-dragging to modify input for the next command. Small direct edits could be a desirable alternative to repeating the command verbally. For a fallback, the user can type with a keyboard to replace the text. (Left: quick operation buttons for selecting pieces of the text)
Figure 5: Find Panel: for performing search queries on sketches' noun and adjective labels. (a) Selecting the word "tree" searches for all objects with that label; tapping on an entry warps to the sketch; the eraser deletes the sketch instead; pen dragging copies entries (b) Forgetting active actions and rules: selecting the word "actions" (in the transcript) and an object in the scene will display a panel of all current actions affecting the object. Deleting an action using the pen will stop the action immediately. Selecting the word "rules" (in the transcript) will display active rules. Tapping with the pen will toggle the rule off/on.
...and 21 more figures

DrawTalking: Building Interactive Worlds by Sketching and Speaking

TL;DR

Abstract

DrawTalking: Building Interactive Worlds by Sketching and Speaking

Authors

TL;DR

Abstract

Table of Contents

Figures (26)