Generation of Real-time Robotic Emotional Expressions Learning from Human Demonstration in Mixed Reality
Chao Wang, Michael Gienger, Fan Zhang
TL;DR
This work tackles the challenge of producing natural, emotionally expressive robot behavior by learning from human demonstrations gathered in mixed reality. It introduces a MR data-collection platform that maps facial cues and gestures to robot components and couples it with a flow-matching, emotion-conditioned generator to synthesize continuous robot poses in real time. The system is demonstrated on a real robot with adaptive visual feedback to mitigate motion sickness, and it is supported by an Emotional-Expression Dataset covering seven emotions at 10 Hz. Preliminary results confirm real-time capability and reveal directions for temporal modeling improvements, dataset expansion, and user studies to quantify recognizability and naturalness.
Abstract
Expressive behaviors in robots are critical for effectively conveying their emotional states during interactions with humans. In this work, we present a framework that autonomously generates realistic and diverse robotic emotional expressions based on expert human demonstrations captured in Mixed Reality (MR). Our system enables experts to teleoperate a virtual robot from a first-person perspective, capturing their facial expressions, head movements, and upper-body gestures, and mapping these behaviors onto corresponding robotic components including eyes, ears, neck, and arms. Leveraging a flow-matching-based generative process, our model learns to produce coherent and varied behaviors in real-time in response to moving objects, conditioned explicitly on given emotional states. A preliminary test validated the effectiveness of our approach for generating autonomous expressions.
