Table of Contents
Fetching ...

Cross-Embodiment Robotic Manipulation Synthesis via Guided Demonstrations through CycleVAE and Human Behavior Transformer

Apan Dastider, Hao Fang, Mingjie Lin

TL;DR

The paper tackles cross-embodiment robotic manipulation where human and robot systems differ in geometry and dynamics by introducing CycleVAE for bidirectional latent alignment between human demonstrations and robotic trajectories, and a causal Human Behavior Transformer to generate abundant expert-like demonstrations. The CycleVAE learns embodiment-agnostic representations with cycle consistency and latent-space alignment via mean and covariance terms, enabling end-to-end trajectory synthesis without paired data. The Human Behavior Transformer serves as a scalable generator of human-like demonstrations, accelerating data collection and reducing reliance on human experts. Experiments on a 7-DoF Franka Panda with QB SoftHand2 for ball tossing and catching show superior performance and faster generation compared to multiple SOTA baselines, demonstrating effective unsupervised cross-embodiment alignment and potential for broader autonomous robotics applications.

Abstract

Cross-embodiment robotic manipulation synthesis for complicated tasks is challenging, partially due to the scarcity of paired cross-embodiment datasets and the impediment of designing intricate controllers. Inspired by robotic learning via guided human expert demonstration, we here propose a novel cross-embodiment robotic manipulation algorithm via CycleVAE and human behavior transformer. First, we utilize unsupervised CycleVAE together with a bidirectional subspace alignment algorithm to align latent motion sequences between cross-embodiments. Second, we propose a casual human behavior transformer design to learn the intrinsic motion dynamics of human expert demonstrations. During the test case, we leverage the proposed transformer for the human expert demonstration generation, which will be aligned using CycleVAE for the final human-robotic manipulation synthesis. We validated our proposed algorithm through extensive experiments using a dexterous robotic manipulator with the robotic hand. Our results successfully generate smooth trajectories across intricate tasks, outperforming prior learning-based robotic motion planning algorithms. These results have implications for performing unsupervised cross-embodiment alignment and future autonomous robotics design. Complete video demonstrations of our experiments can be found in https://sites.google.com/view/humanrobots/home.

Cross-Embodiment Robotic Manipulation Synthesis via Guided Demonstrations through CycleVAE and Human Behavior Transformer

TL;DR

The paper tackles cross-embodiment robotic manipulation where human and robot systems differ in geometry and dynamics by introducing CycleVAE for bidirectional latent alignment between human demonstrations and robotic trajectories, and a causal Human Behavior Transformer to generate abundant expert-like demonstrations. The CycleVAE learns embodiment-agnostic representations with cycle consistency and latent-space alignment via mean and covariance terms, enabling end-to-end trajectory synthesis without paired data. The Human Behavior Transformer serves as a scalable generator of human-like demonstrations, accelerating data collection and reducing reliance on human experts. Experiments on a 7-DoF Franka Panda with QB SoftHand2 for ball tossing and catching show superior performance and faster generation compared to multiple SOTA baselines, demonstrating effective unsupervised cross-embodiment alignment and potential for broader autonomous robotics applications.

Abstract

Cross-embodiment robotic manipulation synthesis for complicated tasks is challenging, partially due to the scarcity of paired cross-embodiment datasets and the impediment of designing intricate controllers. Inspired by robotic learning via guided human expert demonstration, we here propose a novel cross-embodiment robotic manipulation algorithm via CycleVAE and human behavior transformer. First, we utilize unsupervised CycleVAE together with a bidirectional subspace alignment algorithm to align latent motion sequences between cross-embodiments. Second, we propose a casual human behavior transformer design to learn the intrinsic motion dynamics of human expert demonstrations. During the test case, we leverage the proposed transformer for the human expert demonstration generation, which will be aligned using CycleVAE for the final human-robotic manipulation synthesis. We validated our proposed algorithm through extensive experiments using a dexterous robotic manipulator with the robotic hand. Our results successfully generate smooth trajectories across intricate tasks, outperforming prior learning-based robotic motion planning algorithms. These results have implications for performing unsupervised cross-embodiment alignment and future autonomous robotics design. Complete video demonstrations of our experiments can be found in https://sites.google.com/view/humanrobots/home.

Paper Structure

This paper contains 16 sections, 8 equations, 8 figures, 2 tables, 1 algorithm.

Figures (8)

  • Figure 1: Human expert demonstration (left) and Robotic motion (right). The knowledge distillation between two motions through our developed bidirectional subspace alignment method (middle).
  • Figure 2: Overall diagram of our contributions: (i) domain adaptation via bidirectional subspace alignment algorithm; (ii.a) Mapping flow from human demonstration to robotic motion synthesis, (ii.b) When unavailable demonstrations, bidirectional mapping from robot initial state to expert state and expert future motion prediction; (iii) No-expert trajectory synthesis using causal human behavior transformer
  • Figure 3: Our proposed cycle-VAE architecture for bidirectional alignment of human motion and robot motion. We build two VAEs to learn the latent representation of two motion modalities. Further, we apply cycle consistency loss to ensure the alignment is bidirectional.
  • Figure 4: The architecture of our proposed Human Behavior Transformer. We use the masked multi-head attention block in the transformer decoder to ensure causal modeling. Our human behavior transformer is trained in auto-regressive way to predict future human behaviors.
  • Figure 5: Ours Hardware Research Robotic Platform: A 7 DoFs Franka Emika Panda Robotic Manipulator attached with QB Softhand 2 Research at $90^o$ joint. A softball in orange color to manipulate
  • ...and 3 more figures