Table of Contents
Fetching ...

Typist Experiment: an Investigation of Human-to-Human Dictation via Role-play to Inform Voice-based Text Authoring

Can Liu, Siying Hu, Li Feng, Mingming Fan

TL;DR

By analysing the natural language patterns of both authors and typists, this work identified new challenges and opportunities for the design of future dictation interfaces, including the ambiguity of human dictation, the differences between audio-only and with screen, and various passive and active assistance that can potentially be provided by future systems.

Abstract

Voice dictation is increasingly used for text entry, especially in mobile scenarios. However, the speech-based experience gets disrupted when users must go back to a screen and keyboard to review and edit the text. While existing dictation systems focus on improving transcription and error correction, little is known about how to support speech input for the entire text creation process, including composition, reviewing and editing. We conducted an experiment in which ten pairs of participants took on the roles of authors and typists to work on a text authoring task. By analysing the natural language patterns of both authors and typists, we identified new challenges and opportunities for the design of future dictation interfaces, including the ambiguity of human dictation, the differences between audio-only and with screen, and various passive and active assistance that can potentially be provided by future systems.

Typist Experiment: an Investigation of Human-to-Human Dictation via Role-play to Inform Voice-based Text Authoring

TL;DR

By analysing the natural language patterns of both authors and typists, this work identified new challenges and opportunities for the design of future dictation interfaces, including the ambiguity of human dictation, the differences between audio-only and with screen, and various passive and active assistance that can potentially be provided by future systems.

Abstract

Voice dictation is increasingly used for text entry, especially in mobile scenarios. However, the speech-based experience gets disrupted when users must go back to a screen and keyboard to review and edit the text. While existing dictation systems focus on improving transcription and error correction, little is known about how to support speech input for the entire text creation process, including composition, reviewing and editing. We conducted an experiment in which ten pairs of participants took on the roles of authors and typists to work on a text authoring task. By analysing the natural language patterns of both authors and typists, we identified new challenges and opportunities for the design of future dictation interfaces, including the ambiguity of human dictation, the differences between audio-only and with screen, and various passive and active assistance that can potentially be provided by future systems.
Paper Structure (87 sections, 10 figures, 1 table)

This paper contains 87 sections, 10 figures, 1 table.

Figures (10)

  • Figure 1: Example images provided for participants for text composition, from the Dixit game.
  • Figure 2: Examples of how Authors' utterances are coded with the 8 categories.
  • Figure 3: Example visualizations of how text got composed and edited over the timeline of a trial. Each symbol represents the content development in one utterance: Create new content, Re-speak to modify, Re-speak to continue composition and Explicit editing. AO ( Audio only) and AS ( Audio+Screen) are communication modalities between Authors and Typists. Four sizes of the symbols represent the unit size of text operated in the utterance: word, phrase, clause and multiple clauses. A red contour contains the first composition pass in the trial and a green contour contains one revision pass.
  • Figure 4: Example visualizations of the content development process. Overwritten text by re-speaking are aligned vertically across lines. The timeline of text generation runs from left to right and then from the top down. Each coloured text block is generated from one utterance. Darker orange color indicates the same content block being re-spoken more times. Blue blocks are spoken punctuations. Edits by explicit requests are annotated with crossovers or insertion marks.
  • Figure 5: Percentages of occurrences of Explicit Editing Requests in Audio only and Audio+Screen.
  • ...and 5 more figures