Table of Contents
Fetching ...

HelpViz: Automatic Generation of Contextual Visual MobileTutorials from Text-Based Instructions

Mingyuan Zhong, Gang Li, Peggy Chi, Yang Li

TL;DR

HelpViz tackles the challenge of transforming abundant web text instructions into contextual, visual mobile tutorials. It introduces a pipeline that parses instructions into actions, executes them on a fleet of Android emulators to generate assets, and synthesizes tutorials aligned to user context, delivered via a mobile viewer with progress tracking. To address parsing and emulator mismatches, the system uses beam search to output multiple parsing alternatives and look-ahead to skip non-executable steps, improving robustness. A user study shows participants prefer HelpViz tutorials over text-only guidance, driven by clearer visuals and automatic progression, indicating sizable practical potential for scalable, context-aware mobile help. The work offers a scalable pathway to convert large text corpora into usable visual guidance across devices and app versions, with opportunities to extend to more complex workflows and desktop tasks.

Abstract

We present HelpViz, a tool for generating contextual visual mobile tutorials from text-based instructions that are abundant on the web. HelpViz transforms text instructions to graphical tutorials in batch, by extracting a sequence of actions from each text instruction through an instruction parsing model, and executing the extracted actions on a simulation infrastructure that manages an array of Android emulators. The automatic execution of each instruction produces a set of graphical and structural assets, including images, videos, and metadata such as clicked elements for each step. HelpViz then synthesizes a tutorial by combining parsed text instructions with the generated assets, and contextualizes the tutorial to user interaction by tracking the user's progress and highlighting the next step. Our experiments with HelpViz indicate that our pipeline improved tutorial execution robustness and that participants preferred tutorials generated by HelpViz over text-based instructions. HelpViz promises a cost-effective approach for generating contextual visual tutorials for mobile interaction at scale.

HelpViz: Automatic Generation of Contextual Visual MobileTutorials from Text-Based Instructions

TL;DR

HelpViz tackles the challenge of transforming abundant web text instructions into contextual, visual mobile tutorials. It introduces a pipeline that parses instructions into actions, executes them on a fleet of Android emulators to generate assets, and synthesizes tutorials aligned to user context, delivered via a mobile viewer with progress tracking. To address parsing and emulator mismatches, the system uses beam search to output multiple parsing alternatives and look-ahead to skip non-executable steps, improving robustness. A user study shows participants prefer HelpViz tutorials over text-only guidance, driven by clearer visuals and automatic progression, indicating sizable practical potential for scalable, context-aware mobile help. The work offers a scalable pathway to convert large text corpora into usable visual guidance across devices and app versions, with opportunities to extend to more complex workflows and desktop tasks.

Abstract

We present HelpViz, a tool for generating contextual visual mobile tutorials from text-based instructions that are abundant on the web. HelpViz transforms text instructions to graphical tutorials in batch, by extracting a sequence of actions from each text instruction through an instruction parsing model, and executing the extracted actions on a simulation infrastructure that manages an array of Android emulators. The automatic execution of each instruction produces a set of graphical and structural assets, including images, videos, and metadata such as clicked elements for each step. HelpViz then synthesizes a tutorial by combining parsed text instructions with the generated assets, and contextualizes the tutorial to user interaction by tracking the user's progress and highlighting the next step. Our experiments with HelpViz indicate that our pipeline improved tutorial execution robustness and that participants preferred tutorials generated by HelpViz over text-based instructions. HelpViz promises a cost-effective approach for generating contextual visual tutorials for mobile interaction at scale.

Paper Structure

This paper contains 22 sections, 3 figures, 2 tables.

Figures (3)

  • Figure 1: HelpViz converts text instructions that can be found abundantly on the web to visual tutorials and then serves these tutorials in the context of mobile interaction. (1)--(4): Our pipeline converts text instructions to visual tutorials. HelpViz generates the assets such as screenshots and interaction metadata of each instruction by actually executing it on a simulation infrastructure. It then synthesizes visual tutorials by combining these assets with segmented text instructions on a mobile device. (5): HelpViz Viewer serves visual tutorial in the interaction context of the user.
  • Figure 2: HelpViz Viewer serves an automatically-generated visual tutorial in the context of user interaction. In this example, the tutorial illustrates the task of "turning off all notifications." The original text instruction is: "Open your device's Settings app. Tap Apps & notifications. Click Notifications. Tap On lock screen. Click Don't show notifications at all." (Formatting added by the authors for clarity.) Note that in (b), elements with text matched to Step 3 are highlighted to illustrate contextual matching. The matching allows HelpViz to scroll to and highlight that step, illustrated in (c). The blue boxes in (b) are not present during user interaction.
  • Figure 3: The participants' preferences for each technique.