DeLTa: Demonstration and Language-Guided Novel Transparent Object Manipulation
Taeyeop Lee, Gyuree Kang, Bowen Wen, Youngho Kim, Seunghyeok Back, In So Kweon, David Hyunchul Shim, Kuk-Jin Yoon
TL;DR
The paper tackles robust transparent-object manipulation, where unreliable depth sensing and long-horizon precision are required. It introduces DeLTa, a framework integrating stereo depth estimation, 6D pose estimation, and vision-language planning influenced by a single demonstration. Key contributions include 4D hand-object interaction modeling from human videos, a demonstration-based trajectory database, an VLM-grounded task planner with plan grounding, and a last-inch motion planner for safe, collision-aware execution. Empirical results in real-world setups show superior performance on long-horizon tasks compared to strong baselines, highlighting practical impact for real-world human-robot collaboration.
Abstract
Despite the prevalence of transparent object interactions in human everyday life, transparent robotic manipulation research remains limited to short-horizon tasks and basic grasping capabilities.Although some methods have partially addressed these issues, most of them have limitations in generalizability to novel objects and are insufficient for precise long-horizon robot manipulation. To address this limitation, we propose DeLTa (Demonstration and Language-Guided Novel Transparent Object Manipulation), a novel framework that integrates depth estimation, 6D pose estimation, and vision-language planning for precise long-horizon manipulation of transparent objects guided by natural task instructions. A key advantage of our method is its single-demonstration approach, which generalizes 6D trajectories to novel transparent objects without requiring category-level priors or additional training. Additionally, we present a task planner that refines the VLM-generated plan to account for the constraints of a single-arm, eye-in-hand robot for long-horizon object manipulation tasks. Through comprehensive evaluation, we demonstrate that our method significantly outperforms existing transparent object manipulation approaches, particularly in long-horizon scenarios requiring precise manipulation capabilities. Project page: https://sites.google.com/view/DeLTa25/
