Pick-and-place Manipulation Across Grippers Without Retraining: A Learning-optimization Diffusion Policy Approach
Xiangtong Yao, Yirui Zhou, Yuan Meng, Liangyu Dong, Lin Hong, Zitao Zhang, Zhenshan Bing, Kai Huang, Fuchun Sun, Alois Knoll
TL;DR
The paper addresses zero-shot transfer of pick-and-place capabilities across unseen grippers without retraining. It introduces a diffusion-based policy with a learning-optimization hybrid: train on demonstrations from a base gripper and, at inference, apply a constrained denoising process that enforces gripper-specific kinematic and safety constraints for zero-shot adaptation. Gripper configurations can be provided as free-form language during inference, enabling flexible integration. In experiments on a Franka Panda with six grippers, the approach achieves a 93.3% average task success across grippers, handling tool-center-point offsets of 16–23.5 cm and jaw widths of 7.5–11.5 cm, and substantially outperforming diffusion-policy baselines. This demonstrates robust cross-gripper manipulation with imitation-learning sample efficiency and eliminates gripper-specific retraining.
Abstract
Current robotic pick-and-place policies typically require consistent gripper configurations across training and inference. This constraint imposes high retraining or fine-tuning costs, especially for imitation learning-based approaches, when adapting to new end-effectors. To mitigate this issue, we present a diffusion-based policy with a hybrid learning-optimization framework, enabling zero-shot adaptation to novel grippers without additional data collection for retraining policy. During training, the policy learns manipulation primitives from demonstrations collected using a base gripper. At inference, a diffusion-based optimization strategy dynamically enforces kinematic and safety constraints, ensuring that generated trajectories align with the physical properties of unseen grippers. This is achieved through a constrained denoising procedure that adapts trajectories to gripper-specific parameters (e.g., tool-center-point offsets, jaw widths) while preserving collision avoidance and task feasibility. We validate our method on a Franka Panda robot across six gripper configurations, including 3D-printed fingertips, flexible silicone gripper, and Robotiq 2F-85 gripper. Our approach achieves a 93.3% average task success rate across grippers (vs. 23.3-26.7% for diffusion policy baselines), supporting tool-center-point variations of 16-23.5 cm and jaw widths of 7.5-11.5 cm. The results demonstrate that constrained diffusion enables robust cross-gripper manipulation while maintaining the sample efficiency of imitation learning, eliminating the need for gripper-specific retraining. Video and code are available at https://github.com/yaoxt3/GADP.
