Cross-Embodiment Robotic Manipulation Synthesis via Guided Demonstrations through CycleVAE and Human Behavior Transformer
Apan Dastider, Hao Fang, Mingjie Lin
TL;DR
The paper tackles cross-embodiment robotic manipulation where human and robot systems differ in geometry and dynamics by introducing CycleVAE for bidirectional latent alignment between human demonstrations and robotic trajectories, and a causal Human Behavior Transformer to generate abundant expert-like demonstrations. The CycleVAE learns embodiment-agnostic representations with cycle consistency and latent-space alignment via mean and covariance terms, enabling end-to-end trajectory synthesis without paired data. The Human Behavior Transformer serves as a scalable generator of human-like demonstrations, accelerating data collection and reducing reliance on human experts. Experiments on a 7-DoF Franka Panda with QB SoftHand2 for ball tossing and catching show superior performance and faster generation compared to multiple SOTA baselines, demonstrating effective unsupervised cross-embodiment alignment and potential for broader autonomous robotics applications.
Abstract
Cross-embodiment robotic manipulation synthesis for complicated tasks is challenging, partially due to the scarcity of paired cross-embodiment datasets and the impediment of designing intricate controllers. Inspired by robotic learning via guided human expert demonstration, we here propose a novel cross-embodiment robotic manipulation algorithm via CycleVAE and human behavior transformer. First, we utilize unsupervised CycleVAE together with a bidirectional subspace alignment algorithm to align latent motion sequences between cross-embodiments. Second, we propose a casual human behavior transformer design to learn the intrinsic motion dynamics of human expert demonstrations. During the test case, we leverage the proposed transformer for the human expert demonstration generation, which will be aligned using CycleVAE for the final human-robotic manipulation synthesis. We validated our proposed algorithm through extensive experiments using a dexterous robotic manipulator with the robotic hand. Our results successfully generate smooth trajectories across intricate tasks, outperforming prior learning-based robotic motion planning algorithms. These results have implications for performing unsupervised cross-embodiment alignment and future autonomous robotics design. Complete video demonstrations of our experiments can be found in https://sites.google.com/view/humanrobots/home.
