Table of Contents
Fetching ...

UMI-on-Air: Embodiment-Aware Guidance for Embodiment-Agnostic Visuomotor Policies

Harsh Gupta, Xiaofeng Guo, Huy Ha, Chuer Pan, Muqing Cao, Dongjae Lee, Sebastian Scherer, Shuran Song, Guanya Shi

TL;DR

This work addresses the challenge of deploying embodiment-agnostic visuomotor policies on constrained embodiments such as aerial manipulators by introducing Embodiment-Aware Diffusion Policy (EADP), which integrates gradient feedback from embodiment-specific controllers into the diffusion sampling process. Training uses unconstrained human demonstrations collected with a handheld UMI interface, enabling a versatile diffusion policy that can be guided toward embodiment-feasible trajectories at test time. Across simulation and real-world aerial manipulation tasks, EADP reduces the embodiment gap, improves success rates under disturbances, and demonstrates cross-environment generalization without retraining. The approach enables plug-and-play, scalable deployment of general manipulation skills to diverse hardware and environments, advancing practical universal manipulation capabilities.

Abstract

We introduce UMI-on-Air, a framework for embodiment-aware deployment of embodiment-agnostic manipulation policies. Our approach leverages diverse, unconstrained human demonstrations collected with a handheld gripper (UMI) to train generalizable visuomotor policies. A central challenge in transferring these policies to constrained robotic embodiments-such as aerial manipulators-is the mismatch in control and robot dynamics, which often leads to out-of-distribution behaviors and poor execution. To address this, we propose Embodiment-Aware Diffusion Policy (EADP), which couples a high-level UMI policy with a low-level embodiment-specific controller at inference time. By integrating gradient feedback from the controller's tracking cost into the diffusion sampling process, our method steers trajectory generation towards dynamically feasible modes tailored to the deployment embodiment. This enables plug-and-play, embodiment-aware trajectory adaptation at test time. We validate our approach on multiple long-horizon and high-precision aerial manipulation tasks, showing improved success rates, efficiency, and robustness under disturbances compared to unguided diffusion baselines. Finally, we demonstrate deployment in previously unseen environments, using UMI demonstrations collected in the wild, highlighting a practical pathway for scaling generalizable manipulation skills across diverse-and even highly constrained-embodiments. All code, data, and checkpoints will be publicly released after acceptance. Result videos can be found at umi-on-air.github.io.

UMI-on-Air: Embodiment-Aware Guidance for Embodiment-Agnostic Visuomotor Policies

TL;DR

This work addresses the challenge of deploying embodiment-agnostic visuomotor policies on constrained embodiments such as aerial manipulators by introducing Embodiment-Aware Diffusion Policy (EADP), which integrates gradient feedback from embodiment-specific controllers into the diffusion sampling process. Training uses unconstrained human demonstrations collected with a handheld UMI interface, enabling a versatile diffusion policy that can be guided toward embodiment-feasible trajectories at test time. Across simulation and real-world aerial manipulation tasks, EADP reduces the embodiment gap, improves success rates under disturbances, and demonstrates cross-environment generalization without retraining. The approach enables plug-and-play, scalable deployment of general manipulation skills to diverse hardware and environments, advancing practical universal manipulation capabilities.

Abstract

We introduce UMI-on-Air, a framework for embodiment-aware deployment of embodiment-agnostic manipulation policies. Our approach leverages diverse, unconstrained human demonstrations collected with a handheld gripper (UMI) to train generalizable visuomotor policies. A central challenge in transferring these policies to constrained robotic embodiments-such as aerial manipulators-is the mismatch in control and robot dynamics, which often leads to out-of-distribution behaviors and poor execution. To address this, we propose Embodiment-Aware Diffusion Policy (EADP), which couples a high-level UMI policy with a low-level embodiment-specific controller at inference time. By integrating gradient feedback from the controller's tracking cost into the diffusion sampling process, our method steers trajectory generation towards dynamically feasible modes tailored to the deployment embodiment. This enables plug-and-play, embodiment-aware trajectory adaptation at test time. We validate our approach on multiple long-horizon and high-precision aerial manipulation tasks, showing improved success rates, efficiency, and robustness under disturbances compared to unguided diffusion baselines. Finally, we demonstrate deployment in previously unseen environments, using UMI demonstrations collected in the wild, highlighting a practical pathway for scaling generalizable manipulation skills across diverse-and even highly constrained-embodiments. All code, data, and checkpoints will be publicly released after acceptance. Result videos can be found at umi-on-air.github.io.

Paper Structure

This paper contains 19 sections, 8 equations, 8 figures, 1 algorithm.

Figures (8)

  • Figure 1: UMI-on-Air with Embodiment-Aware Guidance. Standard UMI (Universal Manipulation Interface, chi2024universalha2024umilegs) systems use one-way communication by sending high-level policy outputs to low-level controllers via end-effector trajectories—often suboptimal or even infeasible for a given embodiment. Our approach introduces two-way communication, letting the low-level controller steer UMI policies from actions with high tracking cost to those with lower cost, enabling more robust and high-performance cross-embodiment deployment.
  • Figure 2: Aerial Manipulation Tasks. Combining UMI and our embodiment-aware guidance approach enables scalable data-collection and robust deployment of fully-autonomous skills previously beyond reach. On our UAM, we showcase (a) lemon harvesting (must find ripe yellow ones), (b) high precision peg insertion in unseen environments, and (c) long-horizon light bulb installation tasks.
  • Figure 3: Embodiment-Aware Diffusion Policy. Using UMI, we collect data for an embodiment-agnostic Diffusion Policy, which iteratively denoises actions from visual inputs. To produce more feasible actions, we add gradients of the MPC's tracking cost to the diffusion model's output at each iteration, steering the denoising process akin to classifier guidance. Finally, the guided action sequence is tracked by MPC at 50Hz.
  • Figure 4: Data Collection to Deployment. Our data collection setup contains an iPhone running SLAM tracking, a lightweight camera for deployment, and compliant, 3D-printed gripper fingers. By sharing the observation and action space between data collection and deployment time, we minimize the embodiment gap.
  • Figure 5: Policy Adaptation Across Embodiments. Across four simulated tasks (1) and three embodiments (2), we observe that EADP can adapt the embodiment-agnostic diffusion policy to the deployment embodiments with varying "UMI-abilities". Visualizing 32 action samples across different embodiments for the same observation, we observe that UR10e's trajectories is guided upwards to be more kinematically feasible, avoiding kinematic singularities. In contrast, the UAM's trajectories are guided downwards to be more dynamically feasible due to perturbations along the $-Z$ direction.
  • ...and 3 more figures