Table of Contents
Fetching ...

LEGATO: Cross-Embodiment Imitation Using a Grasping Tool

Mingyo Seo, H. Andy Park, Shenli Yuan, Yuke Zhu, Luis Sentis

TL;DR

LEGATO addresses the challenge of transferring visuomotor skills across robots with different morphologies by introducing a handheld gripper that unifies action and observation spaces for demonstrations and deployment. A high-level visuomotor policy outputs gripper trajectories in SE(3), which are then retargeted to diverse robots through an IK-based quadratic program with eSNS optimization, underpinned by a motion-invariant regularization via the Denavit-Hartenberg Bidirectional transform. The core technical contributions are the two-tier policy design, the motion-invariant loss that reduces embodiment bias, and the LEGATO Gripper design enabling hardware-agnostic demonstrations. Experimental results in simulation and on real robots demonstrate improved cross-embodiment transfer and practical viability for scalable imitation learning across heterogeneous robotic platforms.

Abstract

Cross-embodiment imitation learning enables policies trained on specific embodiments to transfer across different robots, unlocking the potential for large-scale imitation learning that is both cost-effective and highly reusable. This paper presents LEGATO, a cross-embodiment imitation learning framework for visuomotor skill transfer across varied kinematic morphologies. We introduce a handheld gripper that unifies action and observation spaces, allowing tasks to be defined consistently across robots. We train visuomotor policies on task demonstrations using this gripper through imitation learning, applying transformation to a motion-invariant space for computing the training loss. Gripper motions generated by the policies are retargeted into high-degree-of-freedom whole-body motions using inverse kinematics for deployment across diverse embodiments. Our evaluations in simulation and real-robot experiments highlight the framework's effectiveness in learning and transferring visuomotor skills across various robots. More information can be found on the project page: https://ut-hcrl.github.io/LEGATO.

LEGATO: Cross-Embodiment Imitation Using a Grasping Tool

TL;DR

LEGATO addresses the challenge of transferring visuomotor skills across robots with different morphologies by introducing a handheld gripper that unifies action and observation spaces for demonstrations and deployment. A high-level visuomotor policy outputs gripper trajectories in SE(3), which are then retargeted to diverse robots through an IK-based quadratic program with eSNS optimization, underpinned by a motion-invariant regularization via the Denavit-Hartenberg Bidirectional transform. The core technical contributions are the two-tier policy design, the motion-invariant loss that reduces embodiment bias, and the LEGATO Gripper design enabling hardware-agnostic demonstrations. Experimental results in simulation and on real robots demonstrate improved cross-embodiment transfer and practical viability for scalable imitation learning across heterogeneous robotic platforms.

Abstract

Cross-embodiment imitation learning enables policies trained on specific embodiments to transfer across different robots, unlocking the potential for large-scale imitation learning that is both cost-effective and highly reusable. This paper presents LEGATO, a cross-embodiment imitation learning framework for visuomotor skill transfer across varied kinematic morphologies. We introduce a handheld gripper that unifies action and observation spaces, allowing tasks to be defined consistently across robots. We train visuomotor policies on task demonstrations using this gripper through imitation learning, applying transformation to a motion-invariant space for computing the training loss. Gripper motions generated by the policies are retargeted into high-degree-of-freedom whole-body motions using inverse kinematics for deployment across diverse embodiments. Our evaluations in simulation and real-robot experiments highlight the framework's effectiveness in learning and transferring visuomotor skills across various robots. More information can be found on the project page: https://ut-hcrl.github.io/LEGATO.

Paper Structure

This paper contains 23 sections, 11 equations, 10 figures.

Figures (10)

  • Figure 1: Overview of LEGATO.LEGATO addresses the challenge of transferring visuomotor skills across diverse robot embodiments. We present a cross-embodiment imitation learning framework using a versatile handheld grasping tool that ensures consistent physical interactions across different embodiments. Visuomotor policies trained on demonstrations by humans or teleoperated robots using the tool can be deployed across various robots equipped with the same gripper. Motion retargeting enables the execution of trajectories on different robots without requiring robot-specific training data.
  • Figure 2: LEGATO's cross-embodiment learning pipeline. During data collection, the LEGATO Gripper records its trajectories, grasping actions, and visual observations captured by its egocentric stereo camera. A visuomotor policy is then trained on these demonstrations through imitation learning. During deployment, the visuomotor policy's outputs are retargeted to the robots' whole-body motions through IK optimization.
  • Figure 3: High-level visuomotor policy architecture. The trained policies generate desired handheld-gripper trajectories and grasping actions $u_t$ at 10 Hz from ego-centric stereo camera observations and previous policy actions. These action and observation spaces, defined in the handheld-gripper frame, remain consistent across various robot platforms. To learn actions on handheld-gripper trajectories, we apply two action losses: the negative log-likelihood loss $\mathcal{L}_{\text{NLL}}$ for the distribution in SE(3) and the L2 loss $\mathcal{L}_{\text{invar}}$ in the DHB motion-invariant space. The grasping actions are trained using the cross-entropy loss $\mathcal{L}_{\text{CE}}$.
  • Figure 4: LEGATO Gripper design. The LEGATO Gripper is designed for both human demonstration collection and robot deployment. (Left) It features a shared actuated gripper with adaptable handles, ensuring reliable human handling and consistent grasping across robots while minimizing components. (Right top) A human demonstrator can directly perform tasks by carrying the LEGATO Gripper in hand. The design includes a simple yet intuitive button interface with a status LED, allowing data recording to start and end with a double-click and grasping actions to trigger with a single click. (Right bottom) The LEGATO Gripper is easily installed on various robots, securely held by their original grippers, and is ready for immediate use.
  • Figure 5: Timelapse of deploying LEGATO in simulation. We trained visuomotor policies using demonstrations from the Abstract embodiment and deployed them on robots with diverse morphologies, from the top: Abstract, Panda, Spot, GR-1, and Google Robot. The timelapse of deploying these policies reveals consistent time steps. The tracking performance of the IK motion retargeting varies with morphology, leading to domain gaps across embodiments. Despite these challenges, LEGATO achieves successful deployment on various robots.
  • ...and 5 more figures