UMI-Underwater: Learning Underwater Manipulation without Underwater Teleoperation

Hao Li; Long Yin Chung; Jack Goler; Ryan Zhang; Xiaochi Xie; Huy Ha; Shuran Song; Mark Cutkosky

UMI-Underwater: Learning Underwater Manipulation without Underwater Teleoperation

Hao Li, Long Yin Chung, Jack Goler, Ryan Zhang, Xiaochi Xie, Huy Ha, Shuran Song, Mark Cutkosky

Abstract

Underwater robotic grasping is difficult due to degraded, highly variable imagery and the expense of collecting diverse underwater demonstrations. We introduce a system that (i) autonomously collects successful underwater grasp demonstrations via a self-supervised data collection pipeline and (ii) transfers grasp knowledge from on-land human demonstrations through a depth-based affordance representation that bridges the on-land-to-underwater domain gap and is robust to lighting and color shift. An affordance model trained on on-land handheld demonstrations is deployed underwater zero-shot via geometric alignment, and an affordance-conditioned diffusion policy is then trained on underwater demonstrations to generate control actions. In pool experiments, our approach improves grasping performance and robustness to background shifts, and enables generalization to objects seen only in on-land data, outperforming RGB-only baselines. Code, videos, and additional results are available at https://umi-under-water.github.io.

UMI-Underwater: Learning Underwater Manipulation without Underwater Teleoperation

Abstract

UMI-Underwater: Learning Underwater Manipulation without Underwater Teleoperation

Abstract

Paper Structure

Table of Contents

Figures (7)