Table of Contents
Fetching ...

Adapting Skills to Novel Grasps: A Self-Supervised Approach

Georgios Papagiannis, Kamil Dreczkowski, Vitalis Vosylius, Edward Johns

TL;DR

This work tackles adapting manipulation trajectories defined for a single grasp pose to novel deployment grasps without regrasping or explicit object models. It introduces an autonomous, self-supervised pipeline that uses an external camera to collect RGB (or depth) images while the end-effector moves with the grasped object, training an alignment network to predict the end-effector displacement needed to align any grasp to a fixed reference grasp, thereby computing a corrective transformation for trajectory adaptation. The method requires no object CAD models or camera calibration and can operate with RGB, depth, or RGB-D, with RGB data delivering the best performance across tasks; it demonstrates substantial improvements over baselines in 1360 real-world evaluations and is compatible with skills obtained via imitation learning (e.g., DOME). The key contributions are (1) a reference-grasp alignment framework that bypasses explicit pose estimation, (2) a self-supervised data collection and segmentation strategy robust to occlusion, and (3) a universal corrective transformation enabling deployment across diverse grasps and tasks, reducing the need for re-learning or manual trajectory redesign in unstructured environments.

Abstract

In this paper, we study the problem of adapting manipulation trajectories involving grasped objects (e.g. tools) defined for a single grasp pose to novel grasp poses. A common approach to address this is to define a new trajectory for each possible grasp explicitly, but this is highly inefficient. Instead, we propose a method to adapt such trajectories directly while only requiring a period of self-supervised data collection, during which a camera observes the robot's end-effector moving with the object rigidly grasped. Importantly, our method requires no prior knowledge of the grasped object (such as a 3D CAD model), it can work with RGB images, depth images, or both, and it requires no camera calibration. Through a series of real-world experiments involving 1360 evaluations, we find that self-supervised RGB data consistently outperforms alternatives that rely on depth images including several state-of-the-art pose estimation methods. Compared to the best-performing baseline, our method results in an average of 28.5% higher success rate when adapting manipulation trajectories to novel grasps on several everyday tasks. Videos of the experiments are available on our webpage at https://www.robot-learning.uk/adapting-skills

Adapting Skills to Novel Grasps: A Self-Supervised Approach

TL;DR

This work tackles adapting manipulation trajectories defined for a single grasp pose to novel deployment grasps without regrasping or explicit object models. It introduces an autonomous, self-supervised pipeline that uses an external camera to collect RGB (or depth) images while the end-effector moves with the grasped object, training an alignment network to predict the end-effector displacement needed to align any grasp to a fixed reference grasp, thereby computing a corrective transformation for trajectory adaptation. The method requires no object CAD models or camera calibration and can operate with RGB, depth, or RGB-D, with RGB data delivering the best performance across tasks; it demonstrates substantial improvements over baselines in 1360 real-world evaluations and is compatible with skills obtained via imitation learning (e.g., DOME). The key contributions are (1) a reference-grasp alignment framework that bypasses explicit pose estimation, (2) a self-supervised data collection and segmentation strategy robust to occlusion, and (3) a universal corrective transformation enabling deployment across diverse grasps and tasks, reducing the need for re-learning or manual trajectory redesign in unstructured environments.

Abstract

In this paper, we study the problem of adapting manipulation trajectories involving grasped objects (e.g. tools) defined for a single grasp pose to novel grasp poses. A common approach to address this is to define a new trajectory for each possible grasp explicitly, but this is highly inefficient. Instead, we propose a method to adapt such trajectories directly while only requiring a period of self-supervised data collection, during which a camera observes the robot's end-effector moving with the object rigidly grasped. Importantly, our method requires no prior knowledge of the grasped object (such as a 3D CAD model), it can work with RGB images, depth images, or both, and it requires no camera calibration. Through a series of real-world experiments involving 1360 evaluations, we find that self-supervised RGB data consistently outperforms alternatives that rely on depth images including several state-of-the-art pose estimation methods. Compared to the best-performing baseline, our method results in an average of 28.5% higher success rate when adapting manipulation trajectories to novel grasps on several everyday tasks. Videos of the experiments are available on our webpage at https://www.robot-learning.uk/adapting-skills
Paper Structure (28 sections, 4 equations, 13 figures, 3 tables)

This paper contains 28 sections, 4 equations, 13 figures, 3 tables.

Figures (13)

  • Figure 1: Skill Acquisition: (a) With the hammer grasped at the skill grasp, the skill's trajectory (defined e.g. via a human demonstration) specifies how to hammer the nail. Skill Deployment without Adaptation: (b) If the hammer is grasped differently to the skill grasp, executing the skill's trajectory leads to task failure. Skill Deployment with Adaptation: (c) A corrective transformation is applied to the skill's EEF trajectory such that (without changing the grasp pose) the hammer under the deployment grasp follows the same trajectory it followed under the skill grasp, to successfully complete the task.
  • Figure .1: Images sampled from the dataset $\mathcal{D}$ collected in a self-supervised manner. The image on the top left (marked as red) shows the EEF at the reference pose with the object grasped at the reference grasp. While the EEF moves around the reference pose to emulate different arbitrary grasps, the object remains rigidly grasped at the reference grasp.
  • Figure .1: Mean and standard deviation error in computing the corrective transformation between grasps for the Spoon and Glass objects averaged across all DoFs for the position and orientation.
  • Figure 2: Self-supervised Data Collection: (a) An example of a possible reference grasp. (b) The EEF at a potential reference pose $^{\textrm{R}}\mathbf{T}_{WE}$ in front of the external camera. The object is at the reference grasp. (c) With the object rigidly grasped at the reference grasp, we sample and move the EEF to random poses ${^{\textrm{N}}}\mathbf{T}_{EE^{'}}$ relative to the reference pose to collect image-transformation pairs in a self-supervised manner that emulate different grasps. Emulating Different Grasps: (d.1) By transforming the EEF and the object at the reference grasp by ${^{\textrm{N}}}\mathbf{T}_{EE^{'}}$, we emulate an arbitrary grasp with some grasp pose ${^{\textrm{A}}}\mathbf{T}_{EO}$ relative to the reference pose as it is shown in (d.2). (e) Then, if the object is grasped at that arbitrary grasp pose ${^{\textrm{A}}}\mathbf{T}_{EO}$ emulated by ${^{\textrm{N}}}\mathbf{T}_{EE^{'}}$, moving the EEF to ${^{\textrm{N}}}\mathbf{T}_{EE^{'}}^{-1}$ relative to the reference pose aligns the object to the reference grasp.
  • Figure .2: Figures (a) and (c) are images sampled from dataset $\mathcal{D}$ collected during self-supervised data collection. Each image in (a) and (c) emulates a grasp depicted in the images of Figures (b) and (d) respectively where the EEF is at the reference pose. We manually reproduced the grasps in Figures (b) and (d) for visualisation.
  • ...and 8 more figures