Generative Expressive Robot Behaviors using Large Language Models
Karthik Mahadevan, Jonathan Chien, Noah Brown, Zhuo Xu, Carolina Parada, Fei Xia, Andy Zeng, Leila Takayama, Dorsa Sadigh
TL;DR
The paper addresses the challenge of generating expressive robot behaviors without extensive task-specific data or rigid rule templates by introducing GenEM, a modular pipeline that leverages large language models to translate natural language instructions into executable robot control code. Through few-shot chain-of-thought prompting and a sequence of LLM-driven modules, GenEM reasons about social norms, maps human expressive intent to robot actions, and enables iterative refinement from user feedback, with demonstrated cross-embodiment applicability. User studies and ablations show that GenEM, especially with iterative feedback (GenEM++), produces behaviors perceived as competent and understandable, often rivaling professionally designed animator baselines and generalizing across devices. The approach reduces the need for curated datasets, supports composable and adaptive expressive behaviors, and holds promise for rapid deployment of naturalistic human-robot interaction in varied embodiments.
Abstract
People employ expressive behaviors to effectively communicate and coordinate their actions with others, such as nodding to acknowledge a person glancing at them or saying "excuse me" to pass people in a busy corridor. We would like robots to also demonstrate expressive behaviors in human-robot interaction. Prior work proposes rule-based methods that struggle to scale to new communication modalities or social situations, while data-driven methods require specialized datasets for each social situation the robot is used in. We propose to leverage the rich social context available from large language models (LLMs) and their ability to generate motion based on instructions or user preferences, to generate expressive robot motion that is adaptable and composable, building upon each other. Our approach utilizes few-shot chain-of-thought prompting to translate human language instructions into parametrized control code using the robot's available and learned skills. Through user studies and simulation experiments, we demonstrate that our approach produces behaviors that users found to be competent and easy to understand. Supplementary material can be found at https://generative-expressive-motion.github.io/.
