Understanding Large-Language Model (LLM)-powered Human-Robot Interaction
Callie Y. Kim, Christine P. Lee, Bilge Mutlu
TL;DR
This work probes the design requirements for deploying large-language models in physically embodied robots by comparing an LLM-powered robot with text- and voice-based agents across four tasks organized by the task circumplex. Using a mixed-methods study with 32 participants, it reveals that LLM-enabled robots elevate expectations for non-verbal cues and perform well in social connection and deliberation, yet can induce anxiety and suffer from linguistic/logic failures in certain tasks. The study provides concrete design implications for both robot embodiment and LLM fine-tuning, recommending task-aware customization and robust non-verbal integration to enhance HRI. Overall, the findings inform how to tailor LLMs and robot design to maximize effective human-robot collaboration in real-world contexts.
Abstract
Large-language models (LLMs) hold significant promise in improving human-robot interaction, offering advanced conversational skills and versatility in managing diverse, open-ended user requests in various tasks and domains. Despite the potential to transform human-robot interaction, very little is known about the distinctive design requirements for utilizing LLMs in robots, which may differ from text and voice interaction and vary by task and context. To better understand these requirements, we conducted a user study (n = 32) comparing an LLM-powered social robot against text- and voice-based agents, analyzing task-based requirements in conversational tasks, including choose, generate, execute, and negotiate. Our findings show that LLM-powered robots elevate expectations for sophisticated non-verbal cues and excel in connection-building and deliberation, but fall short in logical communication and may induce anxiety. We provide design implications both for robots integrating LLMs and for fine-tuning LLMs for use with robots.
