Table of Contents
Fetching ...

Understanding Large-Language Model (LLM)-powered Human-Robot Interaction

Callie Y. Kim, Christine P. Lee, Bilge Mutlu

TL;DR

This work probes the design requirements for deploying large-language models in physically embodied robots by comparing an LLM-powered robot with text- and voice-based agents across four tasks organized by the task circumplex. Using a mixed-methods study with 32 participants, it reveals that LLM-enabled robots elevate expectations for non-verbal cues and perform well in social connection and deliberation, yet can induce anxiety and suffer from linguistic/logic failures in certain tasks. The study provides concrete design implications for both robot embodiment and LLM fine-tuning, recommending task-aware customization and robust non-verbal integration to enhance HRI. Overall, the findings inform how to tailor LLMs and robot design to maximize effective human-robot collaboration in real-world contexts.

Abstract

Large-language models (LLMs) hold significant promise in improving human-robot interaction, offering advanced conversational skills and versatility in managing diverse, open-ended user requests in various tasks and domains. Despite the potential to transform human-robot interaction, very little is known about the distinctive design requirements for utilizing LLMs in robots, which may differ from text and voice interaction and vary by task and context. To better understand these requirements, we conducted a user study (n = 32) comparing an LLM-powered social robot against text- and voice-based agents, analyzing task-based requirements in conversational tasks, including choose, generate, execute, and negotiate. Our findings show that LLM-powered robots elevate expectations for sophisticated non-verbal cues and excel in connection-building and deliberation, but fall short in logical communication and may induce anxiety. We provide design implications both for robots integrating LLMs and for fine-tuning LLMs for use with robots.

Understanding Large-Language Model (LLM)-powered Human-Robot Interaction

TL;DR

This work probes the design requirements for deploying large-language models in physically embodied robots by comparing an LLM-powered robot with text- and voice-based agents across four tasks organized by the task circumplex. Using a mixed-methods study with 32 participants, it reveals that LLM-enabled robots elevate expectations for non-verbal cues and perform well in social connection and deliberation, yet can induce anxiety and suffer from linguistic/logic failures in certain tasks. The study provides concrete design implications for both robot embodiment and LLM fine-tuning, recommending task-aware customization and robust non-verbal integration to enhance HRI. Overall, the findings inform how to tailor LLMs and robot design to maximize effective human-robot collaboration in real-world contexts.

Abstract

Large-language models (LLMs) hold significant promise in improving human-robot interaction, offering advanced conversational skills and versatility in managing diverse, open-ended user requests in various tasks and domains. Despite the potential to transform human-robot interaction, very little is known about the distinctive design requirements for utilizing LLMs in robots, which may differ from text and voice interaction and vary by task and context. To better understand these requirements, we conducted a user study (n = 32) comparing an LLM-powered social robot against text- and voice-based agents, analyzing task-based requirements in conversational tasks, including choose, generate, execute, and negotiate. Our findings show that LLM-powered robots elevate expectations for sophisticated non-verbal cues and excel in connection-building and deliberation, but fall short in logical communication and may induce anxiety. We provide design implications both for robots integrating LLMs and for fine-tuning LLMs for use with robots.
Paper Structure (41 sections, 4 figures)

This paper contains 41 sections, 4 figures.

Figures (4)

  • Figure 1: We investigate people's perceptions of and preferences toward LLM-powered robots. We conducted a user study that compared an LLM-powered social robot against text-based and voice-based agents. Left: Users participated in one of four tasks: choose, generate, execute, and negotiate. Right: The user engages with (1) the text-based agent by entering and receiving text-based prompts, (2) the voice-based agent through spoken prompts (achieved by the robot's voice with the robot concealed behind a black screen, out of the user's view), and (3) the LLM-powered social robot via spoken prompts, in a counterbalanced order.
  • Figure 2: Interaction Examples per Each Task --- Participants were assigned to one task among the four (i.e., execute, negotiate, choose, and generate) and engaged with all three types of agents (i.e., text, voice, and robot.) Top left to clockwise: shows interaction examples of the four tasks.
  • Figure 3: Boxplots with data points overlaid on user satisfaction, length of input prompts, and interaction failures. Embodiment: (T)ext, (V)oice, (R)obot. Tasks: (N)egotiate, (G)enerate, (C)hoose, (E)xecute. Horizontal lines indicate significant pairwise comparisons with Tukey HSD ($p < .05^{\ast}$, $p < .01^{\ast\ast}$, $p < .001^{\ast\ast\ast}$).
  • Figure 4: Summary of Qualitative Findings --- Our findings indicate user preference for LLM-powered robots in the execution and negotiation tasks. These tasks necessitated the establishment of social relationships and rapport, and the robot's social aspects benefited from effective synergy with LLM capabilities. LLM-powered robots were less favored in the choice and generation tasks. In these cases, the robot's interaction medium and its social presence hindered optimal user performance. Additionally, a higher occurrence of technical communication errors contributed to participants' lower preference for robot agents.