GRIM: Task-Oriented Grasping with Conditioning on Generative Examples

Shailesh; Alok Raj; Nayan Kumar; Priya Shukla; Andrew Melnik; Michael Beetz; Gora Chand Nandi

GRIM: Task-Oriented Grasping with Conditioning on Generative Examples

Shailesh, Alok Raj, Nayan Kumar, Priya Shukla, Andrew Melnik, Michael Beetz, Gora Chand Nandi

TL;DR

GRIM tackles data scarcity in Task-Oriented Grasping by adopting a training-free, memory-driven approach that retrieves functional priors from heterogeneous sources and aligns them in 3D through semantic cues. The framework leverages a retrieve-align-transfer pipeline: memory creation from AI-generated videos, web images, and expert demonstrations; memory retrieval via joint DINO- and CLIP-based similarity; semantic 3D alignment to transfer full 6D grasps; and a refine-and-rank step over geometrically stable candidates to ensure executability. Key contributions include a memory construction paradigm independent of task-specific labels, a robust semantic 3D alignment strategy guided by dense features, and a grasp transfer mechanism that preserves task intent while honoring geometry, yielding strong generalization on TaskGrasp with 0.67 mAP and notable real-world success. The results demonstrate that leveraging generative models and cross-domain exemplars can achieve state-of-the-art performance in TOG without extensive annotated datasets, offering a scalable, data-efficient path toward adaptable robotic manipulation.

Abstract

Task-Oriented Grasping (TOG) requires robots to select grasps that are functionally appropriate for a specified task - a challenge that demands an understanding of task semantics, object affordances, and functional constraints. We present GRIM (Grasp Re-alignment via Iterative Matching), a training-free framework that addresses these challenges by leveraging Video Generation Models (VGMs) together with a retrieve-align-transfer pipeline. Beyond leveraging VGMs, GRIM can construct a memory of object-task exemplars sourced from web images, human demonstrations, or generative models. The retrieved task-oriented grasp is then transferred and refined by evaluating it against a set of geometrically stable candidate grasps to ensure both functional suitability and physical feasibility. GRIM demonstrates strong generalization and achieves state-of-the-art performance on standard TOG benchmarks. Project website: https://grim-tog.github.io

GRIM: Task-Oriented Grasping with Conditioning on Generative Examples

TL;DR

Abstract

GRIM: Task-Oriented Grasping with Conditioning on Generative Examples

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)