Table of Contents
Fetching ...

Rambler: Supporting Writing With Speech via LLM-Assisted Gist Manipulation

Susan Lin, Jeremy Warner, J. D. Zamfirescu-Pereira, Matthew G. Lee, Sauhard Jain, Michael Xuelin Huang, Piyawat Lertvittayakumjorn, Shanqing Cai, Shumin Zhai, Björn Hartmann, Can Liu

TL;DR

Dictation accelerates long-form text entry but yields disfluent, verbose transcripts that are costly to edit. Rambler introduces a gist-based interface that structures spoken content into Rambles, automatically cleans transcripts, extracts gists via Semantic Zoom and keywords, and enables macro revisions powered by GPT-4, including respeaking, merging/splitting, and custom prompts. In a within-subject study, Rambler matched the baseline in final text quality while improving review, organization, and iteration, and receiving higher subjective control from users. The work demonstrates that task-tailored LLM-guided GUIs with chunk-level editing can outperform generic chat-based LLM use for long-form writing on mobile, and offers design guidelines for embedding AI in gesture- and speech-driven writing tools.

Abstract

Dictation enables efficient text input on mobile devices. However, writing with speech can produce disfluent, wordy, and incoherent text and thus requires heavy post-processing. This paper presents Rambler, an LLM-powered graphical user interface that supports gist-level manipulation of dictated text with two main sets of functions: gist extraction and macro revision. Gist extraction generates keywords and summaries as anchors to support the review and interaction with spoken text. LLM-assisted macro revisions allow users to respeak, split, merge and transform dictated text without specifying precise editing locations. Together they pave the way for interactive dictation and revision that help close gaps between spontaneous spoken words and well-structured writing. In a comparative study with 12 participants performing verbal composition tasks, Rambler outperformed the baseline of a speech-to-text editor + ChatGPT, as it better facilitates iterative revisions with enhanced user control over the content while supporting surprisingly diverse user strategies.

Rambler: Supporting Writing With Speech via LLM-Assisted Gist Manipulation

TL;DR

Dictation accelerates long-form text entry but yields disfluent, verbose transcripts that are costly to edit. Rambler introduces a gist-based interface that structures spoken content into Rambles, automatically cleans transcripts, extracts gists via Semantic Zoom and keywords, and enables macro revisions powered by GPT-4, including respeaking, merging/splitting, and custom prompts. In a within-subject study, Rambler matched the baseline in final text quality while improving review, organization, and iteration, and receiving higher subjective control from users. The work demonstrates that task-tailored LLM-guided GUIs with chunk-level editing can outperform generic chat-based LLM use for long-form writing on mobile, and offers design guidelines for embedding AI in gesture- and speech-driven writing tools.

Abstract

Dictation enables efficient text input on mobile devices. However, writing with speech can produce disfluent, wordy, and incoherent text and thus requires heavy post-processing. This paper presents Rambler, an LLM-powered graphical user interface that supports gist-level manipulation of dictated text with two main sets of functions: gist extraction and macro revision. Gist extraction generates keywords and summaries as anchors to support the review and interaction with spoken text. LLM-assisted macro revisions allow users to respeak, split, merge and transform dictated text without specifying precise editing locations. Together they pave the way for interactive dictation and revision that help close gaps between spontaneous spoken words and well-structured writing. In a comparative study with 12 participants performing verbal composition tasks, Rambler outperformed the baseline of a speech-to-text editor + ChatGPT, as it better facilitates iterative revisions with enhanced user control over the content while supporting surprisingly diverse user strategies.
Paper Structure (70 sections, 9 figures, 1 table)

This paper contains 70 sections, 9 figures, 1 table.

Figures (9)

  • Figure 1: A labeled screenshot of the Rambler UI. (1) Ramble in default state, with revision functions accessible through buttons on (2) and (3). (4) Ramble in re-speaking mode, where voice input is transcribed so that it can be appended to current text, replace the current text, or to be discarded using the buttons on (5). Fixed at the bottom of the UI is (6), with the Semantic Merge button, New Ramble button, and Semantic Zoom slider.
  • Figure 2: From left to right, example text in three Rambles is presented at all four Semantic Zoom levels: full transcript, 50% length, 25% length, and 10% length.
  • Figure 3: Example Rambles before and after (a) Semantic Merge and (b) Semantic Split. In (a), the user select all the Rambles to include, then press the Semantic Merge button (shown in Figure \ref{['fig:ui-screenshot']}). In (b), the user taps the scissors icon in a Ramble to ask LLM to split it based on content.
  • Figure 4: Example transforming a ramble with Magic Custom Prompt: (top) keywords highlighted in light green; (bottom) custom prompt window triggered by the magic wand button on a Ramble. The user inputs an example prompt, and optionally ticks the checkbox for including keywords as context.
  • Figure 5: Rambler System Architecture Diagram consisting of a web-based frontend, a cloud server mediating between user requests and the OpenAI API for LLM functionality. The AssemblyAI API is used for real-time speech transcription.
  • ...and 4 more figures