Table of Contents
Fetching ...

EMOTION: Expressive Motion Sequence Generation for Humanoid Robots with In-Context Learning

Peide Huang, Yuhan Hu, Nataliya Nechyporenko, Daehwa Kim, Walter Talbott, Jian Zhang

TL;DR

The approach leverages the in-context learning capability of large language models (LLMs) to dynamically generate socially appropriate gesture motion sequences for human-robot interaction and provides design implications for future research to consider a set of variables when generating expressive robotic gestures.

Abstract

This paper introduces a framework, called EMOTION, for generating expressive motion sequences in humanoid robots, enhancing their ability to engage in humanlike non-verbal communication. Non-verbal cues such as facial expressions, gestures, and body movements play a crucial role in effective interpersonal interactions. Despite the advancements in robotic behaviors, existing methods often fall short in mimicking the diversity and subtlety of human non-verbal communication. To address this gap, our approach leverages the in-context learning capability of large language models (LLMs) to dynamically generate socially appropriate gesture motion sequences for human-robot interaction. We use this framework to generate 10 different expressive gestures and conduct online user studies comparing the naturalness and understandability of the motions generated by EMOTION and its human-feedback version, EMOTION++, against those by human operators. The results demonstrate that our approach either matches or surpasses human performance in generating understandable and natural robot motions under certain scenarios. We also provide design implications for future research to consider a set of variables when generating expressive robotic gestures.

EMOTION: Expressive Motion Sequence Generation for Humanoid Robots with In-Context Learning

TL;DR

The approach leverages the in-context learning capability of large language models (LLMs) to dynamically generate socially appropriate gesture motion sequences for human-robot interaction and provides design implications for future research to consider a set of variables when generating expressive robotic gestures.

Abstract

This paper introduces a framework, called EMOTION, for generating expressive motion sequences in humanoid robots, enhancing their ability to engage in humanlike non-verbal communication. Non-verbal cues such as facial expressions, gestures, and body movements play a crucial role in effective interpersonal interactions. Despite the advancements in robotic behaviors, existing methods often fall short in mimicking the diversity and subtlety of human non-verbal communication. To address this gap, our approach leverages the in-context learning capability of large language models (LLMs) to dynamically generate socially appropriate gesture motion sequences for human-robot interaction. We use this framework to generate 10 different expressive gestures and conduct online user studies comparing the naturalness and understandability of the motions generated by EMOTION and its human-feedback version, EMOTION++, against those by human operators. The results demonstrate that our approach either matches or surpasses human performance in generating understandable and natural robot motions under certain scenarios. We also provide design implications for future research to consider a set of variables when generating expressive robotic gestures.

Paper Structure

This paper contains 21 sections, 5 figures, 1 table.

Figures (5)

  • Figure 1: Overview of the EMOTION framework.
  • Figure 2: Motion sequence representation.
  • Figure 3: (a) 10 experimented robot expressive gestures under four non-verbal gesture categorizations, i.e., emblems, illustrators, affective displays, regulators. (b) Illustration of the survey workflow.
  • Figure 4: Users' rated scores of understandability and naturalness for generated robot expressive behaviors, separated by gestures. * and ** indicate statistical significance with one-way ANOVA analysis (* indicates $p<0.05$ and ** indicates $p<0.01$). The error bars indicate standard error (SE) of the mean.
  • Figure 5: Correlations between the perceived naturalness / understandability and demographic variables, including participants' age, gender, general attitude toward robots, empathy and frequency of using hand gestures in their daily life. (* indicates $p<0.05$ and ** indicates $p<0.01$, suggesting significance in the correlation). The shaded areas indicate $95\%$ confidence interval, while the error bars indicate standard error (SE).