GraspLDP: Towards Generalizable Grasping Policy via Latent Diffusion

Enda Xiang; Haoxiang Ma; Xinzhu Ma; Zicheng Liu; Di Huang

GraspLDP: Towards Generalizable Grasping Policy via Latent Diffusion

Enda Xiang, Haoxiang Ma, Xinzhu Ma, Zicheng Liu, Di Huang

TL;DR

This paper incorporates grasp prior knowledge into the diffusion policy framework and introduces a self-supervised reconstruction objective during diffusion to embed the graspness prior, and demonstrates that this approach significantly outperforms baseline methods and exhibits strong dynamic grasping capabilities.

Abstract

This paper focuses on enhancing the grasping precision and generalization of manipulation policies learned via imitation learning. Diffusion-based policy learning methods have recently become the mainstream approach for robotic manipulation tasks. As grasping is a critical subtask in manipulation, the ability of imitation-learned policies to execute precise and generalizable grasps merits particular attention. Existing imitation learning techniques for grasping often suffer from imprecise grasp executions, limited spatial generalization, and poor object generalization. To address these challenges, we incorporate grasp prior knowledge into the diffusion policy framework. In particular, we employ a latent diffusion policy to guide action chunk decoding with grasp pose prior, ensuring that generated motion trajectories adhere closely to feasible grasp configurations. Furthermore, we introduce a self-supervised reconstruction objective during diffusion to embed the graspness prior: at each reverse diffusion step, we reconstruct wrist-camera images back-projected the graspness from the intermediate representations. Both simulation and real robot experiments demonstrate that our approach significantly outperforms baseline methods and exhibits strong dynamic grasping capabilities.

GraspLDP: Towards Generalizable Grasping Policy via Latent Diffusion

TL;DR

Abstract

Paper Structure (20 sections, 14 equations, 10 figures, 10 tables)

This paper contains 20 sections, 14 equations, 10 figures, 10 tables.

Introduction
Related work
Method
Overview
Grasp Guidance in Latent Space
Visual Graspness Cue
Heuristic Pose Selector
Experiments
Experimental Setup
Simulation Evaluation
Ablation Study
Real World Evaluation
Conclusion
Data Collection for Train and Evaluation
Data Collection for Train and Evaluation
...and 5 more sections

Figures (10)

Figure 1: We introduce GraspLDP, a generalizable grasping policy integrated with the prior from grasp detector via latent diffusion. Specifically, prior works generally ( a) predict the grasp pose (e.g. Anygrasp fang23anygrasp) or ( b) generate action sequence (e.g. Diffusion Policy DBLP:conf/rss/ChiFDXCBS23) for grasping. In contrast, ( c) our method extracts grasp priors from a pre-trained grasp detector for action refinement in latent space, and ( d) achieves substantial advantages over previous works in diverse grasping tasks.
Figure 2: Framework of proposed GraspLDP. In Action Latent Learning stage action chunks are refined under the guidance of a grasp pose in latent space encoded by a VAE. In Diffusion on Latent Action Space stage the graspness cue is used to condition the diffusion model’s denoising process and to reconstruct for enhancement.
Figure 3: Inference Pre-process presents our inference pipeline with Heuristic Pose Selector.
Figure 4: Inference latency of three methods on an RTX 4090 GPU, with the policy action horizon aligned to 8 for each inference. Results of GraspVLA are after acceleration with $torch.compile()$.
Figure 5: Qualitative experimental analysis. (a) Grasping trials using objects "mug", "mustard bottle", and "thera med" in simulator. (b) Real world grasping trials corresponding to in domain, object generation, and visual generation performance. In particular, we use colored LED strips in low-light conditions to simulate visual interference.
...and 5 more figures

GraspLDP: Towards Generalizable Grasping Policy via Latent Diffusion

TL;DR

Abstract

GraspLDP: Towards Generalizable Grasping Policy via Latent Diffusion

Authors

TL;DR

Abstract

Table of Contents

Figures (10)