Table of Contents
Fetching ...

SiSCo: Signal Synthesis for Effective Human-Robot Communication Via Large Language Models

Shubham Sonawani, Fabian Weigend, Heni Ben Amor

TL;DR

SiSCo is introduced–a novel framework that combines the computational power of LLMs with mixed-reality technologies to streamline the creation of visual cues for human-robot collaboration and reduces cognitive load for participants, and receives above-average user ratings for on-the-fly signals generated for unseen objects.

Abstract

Effective human-robot collaboration hinges on robust communication channels, with visual signaling playing a pivotal role due to its intuitive appeal. Yet, the creation of visually intuitive cues often demands extensive resources and specialized knowledge. The emergence of Large Language Models (LLMs) offers promising avenues for enhancing human-robot interactions and revolutionizing the way we generate context-aware visual cues. To this end, we introduce SiSCo--a novel framework that combines the computational power of LLMs with mixed-reality technologies to streamline the creation of visual cues for human-robot collaboration. Our results show that SiSCo improves the efficiency of communication in human-robot teaming tasks, reducing task completion time by approximately 73% and increasing task success rates by 18% compared to baseline natural language signals. Additionally, SiSCo reduces cognitive load for participants by 46%, as measured by the NASA-TLX subscale, and receives above-average user ratings for on-the-fly signals generated for unseen objects. To encourage further development and broader community engagement, we provide full access to SiSCo's implementation and related materials on our GitHub repository.

SiSCo: Signal Synthesis for Effective Human-Robot Communication Via Large Language Models

TL;DR

SiSCo is introduced–a novel framework that combines the computational power of LLMs with mixed-reality technologies to streamline the creation of visual cues for human-robot collaboration and reduces cognitive load for participants, and receives above-average user ratings for on-the-fly signals generated for unseen objects.

Abstract

Effective human-robot collaboration hinges on robust communication channels, with visual signaling playing a pivotal role due to its intuitive appeal. Yet, the creation of visually intuitive cues often demands extensive resources and specialized knowledge. The emergence of Large Language Models (LLMs) offers promising avenues for enhancing human-robot interactions and revolutionizing the way we generate context-aware visual cues. To this end, we introduce SiSCo--a novel framework that combines the computational power of LLMs with mixed-reality technologies to streamline the creation of visual cues for human-robot collaboration. Our results show that SiSCo improves the efficiency of communication in human-robot teaming tasks, reducing task completion time by approximately 73% and increasing task success rates by 18% compared to baseline natural language signals. Additionally, SiSCo reduces cognitive load for participants by 46%, as measured by the NASA-TLX subscale, and receives above-average user ratings for on-the-fly signals generated for unseen objects. To encourage further development and broader community engagement, we provide full access to SiSCo's implementation and related materials on our GitHub repository.
Paper Structure (12 sections, 10 figures, 3 tables)

This paper contains 12 sections, 10 figures, 3 tables.

Figures (10)

  • Figure 1: A participant engaging in a human-robot teaming task, with the SiSCo framework mediating by delivering visual signals via a mixed reality interface.
  • Figure 2: Left: The physical setup of the teaming task: The robot places objects on the tabletop surface environment. When the robot needs help, it uses SiSCo to present synthesized signals through a projector (A) or a monitor (B) to the human. Right: The task procedure during the human-robot teaming task.
  • Figure 3: Object signals generated from SiSCo (Top Row) and Dalle-3 (Bottom Row) for same input prompt
  • Figure 4: The teaming task as a schematic: The robot assembles a Z-shaped structure on the tabletop using the objects B, C, D and the red rocket. The robot placed the first three objects but malfunctioned when placing the red rocket. SiSCo projects a visual signal to instruct the human to assist.
  • Figure 5: The architecture of our Signal Synthesizing Communication System (SiSCo). It takes in a task prompt and produces visual and natural language signals.
  • ...and 5 more figures