Adapting Skills to Novel Grasps: A Self-Supervised Approach
Georgios Papagiannis, Kamil Dreczkowski, Vitalis Vosylius, Edward Johns
TL;DR
This work tackles adapting manipulation trajectories defined for a single grasp pose to novel deployment grasps without regrasping or explicit object models. It introduces an autonomous, self-supervised pipeline that uses an external camera to collect RGB (or depth) images while the end-effector moves with the grasped object, training an alignment network to predict the end-effector displacement needed to align any grasp to a fixed reference grasp, thereby computing a corrective transformation for trajectory adaptation. The method requires no object CAD models or camera calibration and can operate with RGB, depth, or RGB-D, with RGB data delivering the best performance across tasks; it demonstrates substantial improvements over baselines in 1360 real-world evaluations and is compatible with skills obtained via imitation learning (e.g., DOME). The key contributions are (1) a reference-grasp alignment framework that bypasses explicit pose estimation, (2) a self-supervised data collection and segmentation strategy robust to occlusion, and (3) a universal corrective transformation enabling deployment across diverse grasps and tasks, reducing the need for re-learning or manual trajectory redesign in unstructured environments.
Abstract
In this paper, we study the problem of adapting manipulation trajectories involving grasped objects (e.g. tools) defined for a single grasp pose to novel grasp poses. A common approach to address this is to define a new trajectory for each possible grasp explicitly, but this is highly inefficient. Instead, we propose a method to adapt such trajectories directly while only requiring a period of self-supervised data collection, during which a camera observes the robot's end-effector moving with the object rigidly grasped. Importantly, our method requires no prior knowledge of the grasped object (such as a 3D CAD model), it can work with RGB images, depth images, or both, and it requires no camera calibration. Through a series of real-world experiments involving 1360 evaluations, we find that self-supervised RGB data consistently outperforms alternatives that rely on depth images including several state-of-the-art pose estimation methods. Compared to the best-performing baseline, our method results in an average of 28.5% higher success rate when adapting manipulation trajectories to novel grasps on several everyday tasks. Videos of the experiments are available on our webpage at https://www.robot-learning.uk/adapting-skills
