Table of Contents
Fetching ...

Cross-Embodiment Robot Manipulation Skill Transfer using Latent Space Alignment

Tianyu Wang, Dwait Bhatt, Xiaolong Wang, Nikolay Atanasov

TL;DR

Cross-Embodiment Robot Manipulation Skill Transfer using Latent Space Alignment tackles transferring manipulation policies across robots with different morphologies. The authors learn a common latent space by training encoders/decoders and a latent policy in a source domain, then align target domains to this latent space via adversarial training and cycle-consistency with unpaired data, enabling zero-shot transfer without target rewards. The approach is validated with sim-to-sim transfers among Panda, Sawyer, and xArm6, as well as sim-to-real transfer to a real xArm6, showing competitive performance and meaningful generalization. This work reduces data requirements for multi-robot deployment and demonstrates practical cross-embodiment skill reuse in robotic manipulation.

Abstract

This paper focuses on transferring control policies between robot manipulators with different morphology. While reinforcement learning (RL) methods have shown successful results in robot manipulation tasks, transferring a trained policy from simulation to a real robot or deploying it on a robot with different states, actions, or kinematics is challenging. To achieve cross-embodiment policy transfer, our key insight is to project the state and action spaces of the source and target robots to a common latent space representation. We first introduce encoders and decoders to associate the states and actions of the source robot with a latent space. The encoders, decoders, and a latent space control policy are trained simultaneously using loss functions measuring task performance, latent dynamics consistency, and encoder-decoder ability to reconstruct the original states and actions. To transfer the learned control policy, we only need to train target encoders and decoders that align a new target domain to the latent space. We use generative adversarial training with cycle consistency and latent dynamics losses without access to the task reward or reward tuning in the target domain. We demonstrate sim-to-sim and sim-to-real manipulation policy transfer with source and target robots of different states, actions, and embodiments. The source code is available at \url{https://github.com/ExistentialRobotics/cross_embodiment_transfer}.

Cross-Embodiment Robot Manipulation Skill Transfer using Latent Space Alignment

TL;DR

Cross-Embodiment Robot Manipulation Skill Transfer using Latent Space Alignment tackles transferring manipulation policies across robots with different morphologies. The authors learn a common latent space by training encoders/decoders and a latent policy in a source domain, then align target domains to this latent space via adversarial training and cycle-consistency with unpaired data, enabling zero-shot transfer without target rewards. The approach is validated with sim-to-sim transfers among Panda, Sawyer, and xArm6, as well as sim-to-real transfer to a real xArm6, showing competitive performance and meaningful generalization. This work reduces data requirements for multi-robot deployment and demonstrates practical cross-embodiment skill reuse in robotic manipulation.

Abstract

This paper focuses on transferring control policies between robot manipulators with different morphology. While reinforcement learning (RL) methods have shown successful results in robot manipulation tasks, transferring a trained policy from simulation to a real robot or deploying it on a robot with different states, actions, or kinematics is challenging. To achieve cross-embodiment policy transfer, our key insight is to project the state and action spaces of the source and target robots to a common latent space representation. We first introduce encoders and decoders to associate the states and actions of the source robot with a latent space. The encoders, decoders, and a latent space control policy are trained simultaneously using loss functions measuring task performance, latent dynamics consistency, and encoder-decoder ability to reconstruct the original states and actions. To transfer the learned control policy, we only need to train target encoders and decoders that align a new target domain to the latent space. We use generative adversarial training with cycle consistency and latent dynamics losses without access to the task reward or reward tuning in the target domain. We demonstrate sim-to-sim and sim-to-real manipulation policy transfer with source and target robots of different states, actions, and embodiments. The source code is available at \url{https://github.com/ExistentialRobotics/cross_embodiment_transfer}.
Paper Structure (15 sections, 11 equations, 9 figures, 3 tables, 2 algorithms)

This paper contains 15 sections, 11 equations, 9 figures, 3 tables, 2 algorithms.

Figures (9)

  • Figure 1: Policy transfer to different robot embodiments using latent space alignment. In the source domain (left), we train a simulated Panda robot to pick and place objects. Our approach allows transferring the policy to different target domains (right), such as a simulated Sawyer robot (top right) or a real xArm6 robot (bottom right), without requiring additional task-specific training data.
  • Figure 2: Approach overview: (a) The source robot learns encoders and decoders $F_s, G_s, \tilde{F}_s, \tilde{G}_s$ for state-action projections between its own space and a latent space. The source robot learns a latent policy $\pi^z$ simultaneously with encoders and decoders with RL. (b) During latent alignment, the source encoder decoder functions are frozen while the target encoder decoder are trained to match latent distributions as well as to satisfy cycle consistency and latent dynamics constraints. (c) During target deployment, we compose the target encoder and decoder functions trained in (b) with the latent policy trained in (a).
  • Figure 3: Overview of target domain alignment losses: (left) the adversarial loss ensures that the state-action distributions in the source and target domain match, (middle) the cycle consistency loss regularizes state-action samples to be close to themselves when translated to the other domain and back, (right) the latent dynamics loss enforces consistent forward and inverse latent transitions.
  • Figure 4: Robosuite simulation tasks (from left to right): Reach, Lift, PickPlace and Stack.
  • Figure 5: Ablation on latent state and action dimensions for policy transfer from Panda to Sawyer and xArm6 robots on the Reach task. The lowest state and action dimensions with reasonable performance are 4.
  • ...and 4 more figures