Cross-Embodiment Robot Manipulation Skill Transfer using Latent Space Alignment
Tianyu Wang, Dwait Bhatt, Xiaolong Wang, Nikolay Atanasov
TL;DR
Cross-Embodiment Robot Manipulation Skill Transfer using Latent Space Alignment tackles transferring manipulation policies across robots with different morphologies. The authors learn a common latent space by training encoders/decoders and a latent policy in a source domain, then align target domains to this latent space via adversarial training and cycle-consistency with unpaired data, enabling zero-shot transfer without target rewards. The approach is validated with sim-to-sim transfers among Panda, Sawyer, and xArm6, as well as sim-to-real transfer to a real xArm6, showing competitive performance and meaningful generalization. This work reduces data requirements for multi-robot deployment and demonstrates practical cross-embodiment skill reuse in robotic manipulation.
Abstract
This paper focuses on transferring control policies between robot manipulators with different morphology. While reinforcement learning (RL) methods have shown successful results in robot manipulation tasks, transferring a trained policy from simulation to a real robot or deploying it on a robot with different states, actions, or kinematics is challenging. To achieve cross-embodiment policy transfer, our key insight is to project the state and action spaces of the source and target robots to a common latent space representation. We first introduce encoders and decoders to associate the states and actions of the source robot with a latent space. The encoders, decoders, and a latent space control policy are trained simultaneously using loss functions measuring task performance, latent dynamics consistency, and encoder-decoder ability to reconstruct the original states and actions. To transfer the learned control policy, we only need to train target encoders and decoders that align a new target domain to the latent space. We use generative adversarial training with cycle consistency and latent dynamics losses without access to the task reward or reward tuning in the target domain. We demonstrate sim-to-sim and sim-to-real manipulation policy transfer with source and target robots of different states, actions, and embodiments. The source code is available at \url{https://github.com/ExistentialRobotics/cross_embodiment_transfer}.
