Table of Contents
Fetching ...

Generative Expressive Robot Behaviors using Large Language Models

Karthik Mahadevan, Jonathan Chien, Noah Brown, Zhuo Xu, Carolina Parada, Fei Xia, Andy Zeng, Leila Takayama, Dorsa Sadigh

TL;DR

The paper addresses the challenge of generating expressive robot behaviors without extensive task-specific data or rigid rule templates by introducing GenEM, a modular pipeline that leverages large language models to translate natural language instructions into executable robot control code. Through few-shot chain-of-thought prompting and a sequence of LLM-driven modules, GenEM reasons about social norms, maps human expressive intent to robot actions, and enables iterative refinement from user feedback, with demonstrated cross-embodiment applicability. User studies and ablations show that GenEM, especially with iterative feedback (GenEM++), produces behaviors perceived as competent and understandable, often rivaling professionally designed animator baselines and generalizing across devices. The approach reduces the need for curated datasets, supports composable and adaptive expressive behaviors, and holds promise for rapid deployment of naturalistic human-robot interaction in varied embodiments.

Abstract

People employ expressive behaviors to effectively communicate and coordinate their actions with others, such as nodding to acknowledge a person glancing at them or saying "excuse me" to pass people in a busy corridor. We would like robots to also demonstrate expressive behaviors in human-robot interaction. Prior work proposes rule-based methods that struggle to scale to new communication modalities or social situations, while data-driven methods require specialized datasets for each social situation the robot is used in. We propose to leverage the rich social context available from large language models (LLMs) and their ability to generate motion based on instructions or user preferences, to generate expressive robot motion that is adaptable and composable, building upon each other. Our approach utilizes few-shot chain-of-thought prompting to translate human language instructions into parametrized control code using the robot's available and learned skills. Through user studies and simulation experiments, we demonstrate that our approach produces behaviors that users found to be competent and easy to understand. Supplementary material can be found at https://generative-expressive-motion.github.io/.

Generative Expressive Robot Behaviors using Large Language Models

TL;DR

The paper addresses the challenge of generating expressive robot behaviors without extensive task-specific data or rigid rule templates by introducing GenEM, a modular pipeline that leverages large language models to translate natural language instructions into executable robot control code. Through few-shot chain-of-thought prompting and a sequence of LLM-driven modules, GenEM reasons about social norms, maps human expressive intent to robot actions, and enables iterative refinement from user feedback, with demonstrated cross-embodiment applicability. User studies and ablations show that GenEM, especially with iterative feedback (GenEM++), produces behaviors perceived as competent and understandable, often rivaling professionally designed animator baselines and generalizing across devices. The approach reduces the need for curated datasets, supports composable and adaptive expressive behaviors, and holds promise for rapid deployment of naturalistic human-robot interaction in varied embodiments.

Abstract

People employ expressive behaviors to effectively communicate and coordinate their actions with others, such as nodding to acknowledge a person glancing at them or saying "excuse me" to pass people in a busy corridor. We would like robots to also demonstrate expressive behaviors in human-robot interaction. Prior work proposes rule-based methods that struggle to scale to new communication modalities or social situations, while data-driven methods require specialized datasets for each social situation the robot is used in. We propose to leverage the rich social context available from large language models (LLMs) and their ability to generate motion based on instructions or user preferences, to generate expressive robot motion that is adaptable and composable, building upon each other. Our approach utilizes few-shot chain-of-thought prompting to translate human language instructions into parametrized control code using the robot's available and learned skills. Through user studies and simulation experiments, we demonstrate that our approach produces behaviors that users found to be competent and easy to understand. Supplementary material can be found at https://generative-expressive-motion.github.io/.
Paper Structure (8 sections, 6 figures, 4 tables)

This paper contains 8 sections, 6 figures, 4 tables.

Figures (6)

  • Figure 1: We present Generative Expressive Motion (GenEM), a new approach to autonomously generate expressive robot behaviors. GenEM takes a desired expressive behavior (or a social context) as language instructions, reasons about human social norms, and generates control code for a robot using pre-existing robot skills and learned expressive behaviors. Iterative feedback can quickly modify the behavior according to user preferences. Here, the * symbols denote frozen large language models.
  • Figure 2: Generative Expressive Motion. Given a language instruction $l_{in}$, the Expressive Instruction Following module reasons about the social norms and outputs how a human might express this behavior ($h$). This is translated into a procedure for robot expressive behavior using a prompt describing the robot's pre-existing capabilities ($r_{pre}$) and any learned expressive behaviors. Then, the procedure is used to generate parametrized robot code $c$ that can be executed. The user can provide iterative feedback $f_i$ on the behavior which is processed to determine whether to re-run the robot behavior module first followed by the code generation module or just the code generation module. Note: * shown on top of all the gray modules denotes them as frozen LLMs.
  • Figure 3: Behaviors tested in the two user studies where the behaviors labelled in green denote those unique to the first study and behaviors labelled in blue denote those unique to the second study. The remaining behaviors (8) were common among the two studies.
  • Figure 4: Plots showing participants' survey responses to three questions about each behavior (of 10) in each condition (of 3) in the 1st user study. Bars at the top denote significant differences, where (*) denotes p<.05 and (**) denotes p<.001. Error bars represent standard error. The first plot shows the average score for each question across conditions. The arrows reflect the direction in which better scores lie.
  • Figure 5: Plots showing participants’ survey responses to three questions about each behavior (of 10) in each condition (of 3) in the 2nd user study. Bars at the top denote significant differences, where (*) denotes p<.05 and (**) denotes p<.001. Error bars represent standard error. The first plot shows the average score for each question across conditions. The arrows reflect the direction in which better scores lie.
  • ...and 1 more figures