Table of Contents
Fetching ...

GenComUI: Exploring Generative Visual Aids as Medium to Support Task-Oriented Human-Robot Communication

Yate Ge, Meiying Li, Xipeng Huang, Yuanda Hu, Qi Wang, Xiaohua Sun, Weiwei Guo

TL;DR

GenComUI investigates generative visual aids as a medium to support task-oriented human-robot communication by integrating an LLM-powered dynamic UI that generates context-aware visuals and code. The paper reports a formative study informing visual design and a within-subject user study (n=20) showing improvements in communication quality, perceived usability, and task understanding when visuals accompany voice, though task completion time remains similar. Key contributions include a formative study on visual tool usage, a functional GenComUI prototype with four integrated modules, an empirical evaluation, and design implications for adaptive visual aids in HRI and end-user development. The findings suggest that dynamically generated visuals can enhance naturalistic human-robot collaboration and guide future multimodal interfaces and end-user programming tools.

Abstract

This work investigates the integration of generative visual aids in human-robot task communication. We developed GenComUI, a system powered by large language models that dynamically generates contextual visual aids (such as map annotations, path indicators, and animations) to support verbal task communication and facilitate the generation of customized task programs for the robot. This system was informed by a formative study that examined how humans use external visual tools to assist verbal communication in spatial tasks. To evaluate its effectiveness, we conducted a user experiment (n = 20) comparing GenComUI with a voice-only baseline. The results demonstrate that generative visual aids, through both qualitative and quantitative analysis, enhance verbal task communication by providing continuous visual feedback, thus promoting natural and effective human-robot communication. Additionally, the study offers a set of design implications, emphasizing how dynamically generated visual aids can serve as an effective communication medium in human-robot interaction. These findings underscore the potential of generative visual aids to inform the design of more intuitive and effective human-robot communication, particularly for complex communication scenarios in human-robot interaction and LLM-based end-user development.

GenComUI: Exploring Generative Visual Aids as Medium to Support Task-Oriented Human-Robot Communication

TL;DR

GenComUI investigates generative visual aids as a medium to support task-oriented human-robot communication by integrating an LLM-powered dynamic UI that generates context-aware visuals and code. The paper reports a formative study informing visual design and a within-subject user study (n=20) showing improvements in communication quality, perceived usability, and task understanding when visuals accompany voice, though task completion time remains similar. Key contributions include a formative study on visual tool usage, a functional GenComUI prototype with four integrated modules, an empirical evaluation, and design implications for adaptive visual aids in HRI and end-user development. The findings suggest that dynamically generated visuals can enhance naturalistic human-robot collaboration and guide future multimodal interfaces and end-user programming tools.

Abstract

This work investigates the integration of generative visual aids in human-robot task communication. We developed GenComUI, a system powered by large language models that dynamically generates contextual visual aids (such as map annotations, path indicators, and animations) to support verbal task communication and facilitate the generation of customized task programs for the robot. This system was informed by a formative study that examined how humans use external visual tools to assist verbal communication in spatial tasks. To evaluate its effectiveness, we conducted a user experiment (n = 20) comparing GenComUI with a voice-only baseline. The results demonstrate that generative visual aids, through both qualitative and quantitative analysis, enhance verbal task communication by providing continuous visual feedback, thus promoting natural and effective human-robot communication. Additionally, the study offers a set of design implications, emphasizing how dynamically generated visual aids can serve as an effective communication medium in human-robot interaction. These findings underscore the potential of generative visual aids to inform the design of more intuitive and effective human-robot communication, particularly for complex communication scenarios in human-robot interaction and LLM-based end-user development.

Paper Structure

This paper contains 59 sections, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Sample maps drawn by participants during the formative study, showing how they used visual elements to represent and communicate spatial tasks through annotations, paths, and markers.
  • Figure 2: The communication process in GenComUI: (A) Multi-turn and iterative communication for robot task customization, showing progressive visual feedback as users specify and modify task requirements; (B) Task requirement confirmation phase with step-by-step visual and verbal verification; (C) Code deployment and on-robot testing phase enabling real-world validation of the specified task.
  • Figure 3: GenComUI system architecture: (A) Voice Interaction Module enabling bidirectional voice communication between users and robot; (B) User Intention Understanding Module analyzing user input and dialogue context to generate structured outputs; (C) Generative Visual Aids Module creating dynamic visual interface elements and animations on a spatial map; (D) Task Program Synthesis and Deployment Module for generating and executing robot code.
  • Figure 4: Visual design elements and animation specifications in GenComUI: Color palette for different task components, line types (solid and dashed) for path representation, text labels for spatial behavior description, icon and number markers for location identification, task steps and robot behaviors, and animation effects (fade-in/fade-out) for visual feedback.
  • Figure 5: Experimental setup for system comparison: (Left) Paper floor plan provided to participants during baseline testing; (Right) Study environment showing interface comparison between baseline system (displaying a facial expression) and GenComUI (showing generative visual aids).
  • ...and 2 more figures