Table of Contents
Fetching ...

Single-View Shape Completion for Robotic Grasping in Clutter

Abhishek Kashyap, Yuxuan Yang, Henrik Andreasson, Todor Stoyanov

TL;DR

The paper tackles grasping in clutter from a single view by using diffusion-based category-level shape completion to infer complete object geometry from partial depth. It integrates open-vocabulary segmentation (LangSAM), diffusion-SDF based shape completion, and a GraspGen-based grasp inference module in a modular pipeline, demonstrated on real robotic hardware. Results show that completing the object shape significantly improves grasp success, outperforming a no-completion baseline by 23% and a recent diffusion-based method by 19%, across diverse household objects. The work highlights practical gains for robotic manipulation in clutter and points to future directions like language-guided shape completion and faster inference.

Abstract

In vision-based robot manipulation, a single camera view can only capture one side of objects of interest, with additional occlusions in cluttered scenes further restricting visibility. As a result, the observed geometry is incomplete, and grasp estimation algorithms perform suboptimally. To address this limitation, we leverage diffusion models to perform category-level 3D shape completion from partial depth observations obtained from a single view, reconstructing complete object geometries to provide richer context for grasp planning. Our method focuses on common household items with diverse geometries, generating full 3D shapes that serve as input to downstream grasp inference networks. Unlike prior work, which primarily considers isolated objects or minimal clutter, we evaluate shape completion and grasping in realistic clutter scenarios with household objects. In preliminary evaluations on a cluttered scene, our approach consistently results in better grasp success rates than a naive baseline without shape completion by 23% and over a recent state of the art shape completion approach by 19%. Our code is available at https://amm.aass.oru.se/shape-completion-grasping/.

Single-View Shape Completion for Robotic Grasping in Clutter

TL;DR

The paper tackles grasping in clutter from a single view by using diffusion-based category-level shape completion to infer complete object geometry from partial depth. It integrates open-vocabulary segmentation (LangSAM), diffusion-SDF based shape completion, and a GraspGen-based grasp inference module in a modular pipeline, demonstrated on real robotic hardware. Results show that completing the object shape significantly improves grasp success, outperforming a no-completion baseline by 23% and a recent diffusion-based method by 19%, across diverse household objects. The work highlights practical gains for robotic manipulation in clutter and points to future directions like language-guided shape completion and faster inference.

Abstract

In vision-based robot manipulation, a single camera view can only capture one side of objects of interest, with additional occlusions in cluttered scenes further restricting visibility. As a result, the observed geometry is incomplete, and grasp estimation algorithms perform suboptimally. To address this limitation, we leverage diffusion models to perform category-level 3D shape completion from partial depth observations obtained from a single view, reconstructing complete object geometries to provide richer context for grasp planning. Our method focuses on common household items with diverse geometries, generating full 3D shapes that serve as input to downstream grasp inference networks. Unlike prior work, which primarily considers isolated objects or minimal clutter, we evaluate shape completion and grasping in realistic clutter scenarios with household objects. In preliminary evaluations on a cluttered scene, our approach consistently results in better grasp success rates than a naive baseline without shape completion by 23% and over a recent state of the art shape completion approach by 19%. Our code is available at https://amm.aass.oru.se/shape-completion-grasping/.

Paper Structure

This paper contains 19 sections, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Grasping in clutter with shape completion. Left: Household objects in robot workspace, viewed through Intel Realsense D435i. Middle: Shape completion of the target object and grasp inference on the completed shape. Right: Grasp execution.
  • Figure 2: Overview of the proposed method. RGB information is used to segment an object of interest. The object pointcloud is then fed into a diffusion model to obtain a completed surface, which then informs grasp planning. Grasps are ranked and selected for execution (green grasp in figure).
  • Figure 3: Qualitative results of Diffusion-SDF on different levels of clutter (easy, normal, and hard) of the ReOcS dataset iwase2025zerograsp.
  • Figure 4: Scene configurations used in the real robot experiments.
  • Figure 5: Comparison of reconstruction quality from real-world experiments. Our approach consistently results in more plausible geometries.