Table of Contents
Fetching ...

Grasping Partially Occluded Objects Using Autoencoder-Based Point Cloud Inpainting

Alexander Koebler, Ralf Gross, Florian Buettner, Ingo Thon

TL;DR

This work tackles the problem of partial occlusion in robotic grasping by introducing an autoencoder-based point-cloud inpainting pipeline that reconstructs occluded object geometry from single-view scans. It converts unordered 3D data into equidistant depth images to leverage CNN-based segmentation and inpainting, with a loss function that blends pixel, perceptual, and style cues (PSBL). Training relies solely on synthetic data from a digital twin, enabling deployment without disrupting real production, and a real-world deployment achieves 76% successful grasping in occluded scenarios, substantially reducing discarded parts. The approach preserves key geometric features critical for subsequent surface-based 3D matching, enabling existing grasp planners to operate robustly under occlusion and offering practical benefits for industrial manufacturing workflows.

Abstract

Flexible industrial production systems will play a central role in the future of manufacturing due to higher product individualization and customization. A key component in such systems is the robotic grasping of known or unknown objects in random positions. Real-world applications often come with challenges that might not be considered in grasping solutions tested in simulation or lab settings. Partial occlusion of the target object is the most prominent. Examples of occlusion can be supporting structures in the camera's field of view, sensor imprecision, or parts occluding each other due to the production process. In all these cases, the resulting lack of information leads to shortcomings in calculating grasping points. In this paper, we present an algorithm to reconstruct the missing information. Our inpainting solution facilitates the real-world utilization of robust object matching approaches for grasping point calculation. We demonstrate the benefit of our solution by enabling an existing grasping system embedded in a real-world industrial application to handle occlusions in the input. With our solution, we drastically decrease the number of objects discarded by the process.

Grasping Partially Occluded Objects Using Autoencoder-Based Point Cloud Inpainting

TL;DR

This work tackles the problem of partial occlusion in robotic grasping by introducing an autoencoder-based point-cloud inpainting pipeline that reconstructs occluded object geometry from single-view scans. It converts unordered 3D data into equidistant depth images to leverage CNN-based segmentation and inpainting, with a loss function that blends pixel, perceptual, and style cues (PSBL). Training relies solely on synthetic data from a digital twin, enabling deployment without disrupting real production, and a real-world deployment achieves 76% successful grasping in occluded scenarios, substantially reducing discarded parts. The approach preserves key geometric features critical for subsequent surface-based 3D matching, enabling existing grasp planners to operate robustly under occlusion and offering practical benefits for industrial manufacturing workflows.

Abstract

Flexible industrial production systems will play a central role in the future of manufacturing due to higher product individualization and customization. A key component in such systems is the robotic grasping of known or unknown objects in random positions. Real-world applications often come with challenges that might not be considered in grasping solutions tested in simulation or lab settings. Partial occlusion of the target object is the most prominent. Examples of occlusion can be supporting structures in the camera's field of view, sensor imprecision, or parts occluding each other due to the production process. In all these cases, the resulting lack of information leads to shortcomings in calculating grasping points. In this paper, we present an algorithm to reconstruct the missing information. Our inpainting solution facilitates the real-world utilization of robust object matching approaches for grasping point calculation. We demonstrate the benefit of our solution by enabling an existing grasping system embedded in a real-world industrial application to handle occlusions in the input. With our solution, we drastically decrease the number of objects discarded by the process.

Paper Structure

This paper contains 18 sections, 5 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: Investigated problem setting for point cloud inpainting. As shown on the left, the laser scanner can only capture a small portion of the geometry of the occluded object. This is insufficient for the 3D matching algorithm that calculates the grasping points. Our inpainting solution outputs the complete point cloud on the right with the reconstructed lower object. With that, the surface-based matching algorithm can estimate the pose and orientation of both objects and determine corresponding grasping points.
  • Figure 2: Simulation environment for generating a synthetic training dataset of occluded scenes. In (a), the green conveyor belt and the laser scanner are shown. (b) depicts the scanning process of a single object, where the small grey cubes illustrate the beams of the laser scanner. The resulting synthetic point cloud is shown in (c).
  • Figure 3: Data processing pipeline with resulting in- and outputs of the deployed point cloud inpainting solution
  • Figure 4: Filtering steps for the real point clouds. The laser scanner captures different forms of noise and distractions in the raw input point cloud (a), such as the floor on the lower right and the wall on the left side. After cropping the distractions from the recorded view, sparse noise on the conveyor belt is left (b). By removing the sparse noise, only the dense clusters of the objects themselves remain (c).
  • Figure 5: Training architecture for the inpainting model using a VGG16 loss-network. The loss-network $E_L$ is trained unsupervised by learning equivalence on the depth image of the complete lower object. The inpainting model $I$ is trained to reconstruct the complete lower object from the depth image, including the masked top object and the partially occluded lower object. The intermediate representations of the ground truth image $\psi_p^{X_{gt}}$ and the reconstructed image $\psi_p^{X_{out}}$ for the style and perceptual loss are taken after the first four convolutional blocks.
  • ...and 3 more figures