Diffusion Suction Grasping with Large-Scale Parcel Dataset
Ding-Tao Huang, Xinyi He, Debei Hua, Dongfang Yu, En-Te Lin, Long Zeng
TL;DR
This work tackles robust suction grasping in cluttered parcel scenes by introducing the Parcel-Suction-Dataset, a large synthetic benchmark with 25,000 scenes and 410 million labeled suction grasp poses, and a diffusion-based framework, Diffusion-Suction, that reframes grasp prediction as a conditional denoising process guided by 3D visual cues. The method decouples a point-cloud encoder from a lightweight diffusion head (PCDB), enabling efficient inference while learning spatial point-wise affordances from synthetic data. Across Parcel-Suction-Dataset and SuctionNet-1Billion, Diffusion-Suction achieves state-of-the-art performance and strong generalization, with ablations confirming the value of 3D normals, visibility cues, and an appropriate number of diffusion steps. The work promises practical impact for scalable, reliable parcel handling, and the authors plan to release code and dataset publicly.
Abstract
While recent advances in object suction grasping have shown remarkable progress, significant challenges persist particularly in cluttered and complex parcel handling scenarios. Two fundamental limitations hinder current approaches: (1) the lack of a comprehensive suction grasp dataset tailored for parcel manipulation tasks, and (2) insufficient adaptability to diverse object characteristics including size variations, geometric complexity, and textural diversity. To address these challenges, we present Parcel-Suction-Dataset, a large-scale synthetic dataset containing 25 thousand cluttered scenes with 410 million precision-annotated suction grasp poses. This dataset is generated through our novel geometric sampling algorithm that enables efficient generation of optimal suction grasps incorporating both physical constraints and material properties. We further propose Diffusion-Suction, an innovative framework that reformulates suction grasp prediction as a conditional generation task through denoising diffusion probabilistic models. Our method iteratively refines random noise into suction grasp score maps through visual-conditioned guidance from point cloud observations, effectively learning spatial point-wise affordances from our synthetic dataset. Extensive experiments demonstrate that the simple yet efficient Diffusion-Suction achieves new state-of-the-art performance compared to previous models on both Parcel-Suction-Dataset and the public SuctionNet-1Billion benchmark.
