Table of Contents
Fetching ...

One-Shot Manipulation Strategy Learning by Making Contact Analogies

Yuyao Liu, Jiayuan Mao, Joshua Tenenbaum, Tomás Lozano-Pérez, Leslie Pack Kaelbling

TL;DR

MAGIC tackles the problem of one-shot manipulation by learning a generalizable contact-based strategy from a single demonstration. It combines a global-to-local contact point matching pipeline—utilizing DINOv2 pretrained features for global correspondence and multi-scale curvature estimation for local refinement—with simulation verification to generate SE($3$) trajectories on novel objects. The approach demonstrates strong generalization across scooping, hanging, and hooking tasks in both simulation and real-world robot experiments, outperforming traditional global or local shape matching baselines and reducing search time. This yields a practical, fast framework for tool-using manipulation that can generalize to unseen object categories and supports downstream motion planning and tool selection without large task-specific datasets.

Abstract

We present a novel approach, MAGIC (manipulation analogies for generalizable intelligent contacts), for one-shot learning of manipulation strategies with fast and extensive generalization to novel objects. By leveraging a reference action trajectory, MAGIC effectively identifies similar contact points and sequences of actions on novel objects to replicate a demonstrated strategy, such as using different hooks to retrieve distant objects of different shapes and sizes. Our method is based on a two-stage contact-point matching process that combines global shape matching using pretrained neural features with local curvature analysis to ensure precise and physically plausible contact points. We experiment with three tasks including scooping, hanging, and hooking objects. MAGIC demonstrates superior performance over existing methods, achieving significant improvements in runtime speed and generalization to different object categories. Website: https://magic-2024.github.io/ .

One-Shot Manipulation Strategy Learning by Making Contact Analogies

TL;DR

MAGIC tackles the problem of one-shot manipulation by learning a generalizable contact-based strategy from a single demonstration. It combines a global-to-local contact point matching pipeline—utilizing DINOv2 pretrained features for global correspondence and multi-scale curvature estimation for local refinement—with simulation verification to generate SE() trajectories on novel objects. The approach demonstrates strong generalization across scooping, hanging, and hooking tasks in both simulation and real-world robot experiments, outperforming traditional global or local shape matching baselines and reducing search time. This yields a practical, fast framework for tool-using manipulation that can generalize to unseen object categories and supports downstream motion planning and tool selection without large task-specific datasets.

Abstract

We present a novel approach, MAGIC (manipulation analogies for generalizable intelligent contacts), for one-shot learning of manipulation strategies with fast and extensive generalization to novel objects. By leveraging a reference action trajectory, MAGIC effectively identifies similar contact points and sequences of actions on novel objects to replicate a demonstrated strategy, such as using different hooks to retrieve distant objects of different shapes and sizes. Our method is based on a two-stage contact-point matching process that combines global shape matching using pretrained neural features with local curvature analysis to ensure precise and physically plausible contact points. We experiment with three tasks including scooping, hanging, and hooking objects. MAGIC demonstrates superior performance over existing methods, achieving significant improvements in runtime speed and generalization to different object categories. Website: https://magic-2024.github.io/ .

Paper Structure

This paper contains 11 sections, 4 equations, 15 figures, 5 tables.

Figures (15)

  • Figure 1: We introduce magic (Manipulation Analogies for Generalizable Intelligent Contacts), a pipeline that is capable of learning manipulation strategies from single demonstrations and applying them to novel objects.
  • Figure 2: The overall pipeline of magic. (a) We first extract contact points from the reference trajectory. (b) Then, we compute a global and local contact point matching score to select candidate contact points on novel objects. (c) The generated contact points will be used for motion retargeting or motion planning, and the final motion will be simulated and verified by a physical simulator.
  • Figure 3: Global and Local Contact Point Matching. The contact point matching process consists of two phases: (a) global matching using DINOv2 oquab2023dinov2 features; (b) local matching involving multi-scale curvature estimation and refinements using irrelevant point suppression and convexity matching.
  • Figure 4: (a) The observing scale at which we perform estimation has effect on both the magnitude and the sign of the estimated curvature. (b) Filtering out the irrelevant points can give a more accurate curvature estimation. (c) We perform local search within the observation region to find the contact point with the correct convexity.
  • Figure 5: (a) Global shape matching fails to provide effective functional alignment due to the lack of awareness of local contact points. (b) Local curvature helps identify more accurate contact points and contact normals. (c) Pretrained DINOv2 features enhance data efficiency by selecting the optimal contact (the red star) and grasping points from those that curvature alone cannot distinguish (the green region). We also visualize the DINOv2 matching heatmaps.
  • ...and 10 more figures