FABG : End-to-end Imitation Learning for Embodied Affective Human-Robot Interaction
Yanghai Zhang, Changyi Liu, Keting Fu, Wenbin Zhou, Qingdu Li, Jianwei Zhang
TL;DR
FABG addresses the challenge of end-to-end imitation learning for expressive human-robot interaction by integrating an immersive VR-based demonstration pipeline, depth-enhanced multimodal perception, and a Prediction-Driven Latency Compensation (PDLC) mechanism. By collecting high-quality demonstrations with synchronized operator facial cues and first-person RGB-D views, and fusing semantic and geometric features into a robust 3D perception representation, the approach enables real-time, fluid facial behaviors on a 25-DoF humanoid head. The PDLC strategy eliminates action stuttering and compensates multi-source delays by executing the $n+1$-th action from a fixed-length sequence of length $k$, with per-timestep policy queries. Experimental results across four interaction tasks show significant improvements in response speed and motion fluency for RGB-D inputs with PDLC, validating the practicality of FABG for social robotics and suggesting avenues for future integration with language models for richer human-robot communication.
Abstract
This paper proposes FABG (Facial Affective Behavior Generation), an end-to-end imitation learning system for human-robot interaction, designed to generate natural and fluid facial affective behaviors. In interaction, effectively obtaining high-quality demonstrations remains a challenge. In this work, we develop an immersive virtual reality (VR) demonstration system that allows operators to perceive stereoscopic environments. This system ensures "the operator's visual perception matches the robot's sensory input" and "the operator's actions directly determine the robot's behaviors" - as if the operator replaces the robot in human interaction engagements. We propose a prediction-driven latency compensation strategy to reduce robotic reaction delays and enhance interaction fluency. FABG naturally acquires human interactive behaviors and subconscious motions driven by intuition, eliminating manual behavior scripting. We deploy FABG on a real-world 25-degree-of-freedom (DoF) humanoid robot, validating its effectiveness through four fundamental interaction tasks: expression response, dynamic gaze, foveated attention, and gesture recognition, supported by data collection and policy training. Project website: https://cybergenies.github.io
