Table of Contents
Fetching ...

UMI-Underwater: Learning Underwater Manipulation without Underwater Teleoperation

Hao Li, Long Yin Chung, Jack Goler, Ryan Zhang, Xiaochi Xie, Huy Ha, Shuran Song, Mark Cutkosky

Abstract

Underwater robotic grasping is difficult due to degraded, highly variable imagery and the expense of collecting diverse underwater demonstrations. We introduce a system that (i) autonomously collects successful underwater grasp demonstrations via a self-supervised data collection pipeline and (ii) transfers grasp knowledge from on-land human demonstrations through a depth-based affordance representation that bridges the on-land-to-underwater domain gap and is robust to lighting and color shift. An affordance model trained on on-land handheld demonstrations is deployed underwater zero-shot via geometric alignment, and an affordance-conditioned diffusion policy is then trained on underwater demonstrations to generate control actions. In pool experiments, our approach improves grasping performance and robustness to background shifts, and enables generalization to objects seen only in on-land data, outperforming RGB-only baselines. Code, videos, and additional results are available at https://umi-under-water.github.io.

UMI-Underwater: Learning Underwater Manipulation without Underwater Teleoperation

Abstract

Underwater robotic grasping is difficult due to degraded, highly variable imagery and the expense of collecting diverse underwater demonstrations. We introduce a system that (i) autonomously collects successful underwater grasp demonstrations via a self-supervised data collection pipeline and (ii) transfers grasp knowledge from on-land human demonstrations through a depth-based affordance representation that bridges the on-land-to-underwater domain gap and is robust to lighting and color shift. An affordance model trained on on-land handheld demonstrations is deployed underwater zero-shot via geometric alignment, and an affordance-conditioned diffusion policy is then trained on underwater demonstrations to generate control actions. In pool experiments, our approach improves grasping performance and robustness to background shifts, and enables generalization to objects seen only in on-land data, outperforming RGB-only baselines. Code, videos, and additional results are available at https://umi-under-water.github.io.

Paper Structure

This paper contains 54 sections, 3 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: UMI-Underwater. We address data-collection and generalization bottlenecks in underwater manipulation by pairing autonomous, self-supervised data collection with a zero-shot, depth-based affordance predictor that transfers directly from land to water.
  • Figure 2: ROV Manipulation Setup for Self-supervised Underwater Data Collection. The setup includes a swimming pool, with objects scattered across the workspace for repeated grasp attempts. Four fixed external cameras are mounted at the pool corners to provide real-time 3D localization used for safety functions (but not provided as inputs to the learned policy). The tethered ROV operates in this environment to execute the staged grasping routine.
  • Figure 3: Autonomous Data Collection Pipeline. Our heuristic controller autonomously collects grasping episodes by using a segmentation model to select a target, servoing the object centroid to stage-specific pixel setpoints using PD control, closing the gripper when a depth threshold is met, and labeling success via drag validation.
  • Figure 4: Autonomous Recovery Strategies include (a) regrasp after failed grasps and (b) backup when the robot overshoots. These strategies improve the success rate of demonstration collection as well as improve policy robustness by demonstrating recovery behavior.
  • Figure 5: UMI-Aquatic on-land demonstration setup. Our handheld gripper with an iPhone camera system and AprilTags enables portable data collection and reliable gripper-state tracking for automatic demonstration labeling. Cropping and geometric warping via reprojection align the iPhone view to match the underwater robot camera.
  • ...and 2 more figures