Table of Contents
Fetching ...

NeRF-Based Transparent Object Grasping Enhanced by Shape Priors

Yi Han, Zixin Lin, Dongjie Li, Lvping Chen, Yongliang Shi, Gan Ma

TL;DR

This work tackles the challenge of grasping transparent objects by combining NeRF-based panoramic scene reconstruction with a shape-prior–driven completion module and a pose-estimation step tailored for non-ideal geometries. The pipeline culminates in scene-level 6-DoF grasp predictions generated by a GraspNet-1Billion–based model, validated on real robotic hardware. Key contributions include a robust NeRF-based reconstruction for transparency, a dense shape completion approach using a pre-trained auto-decoder guided by shape priors, and demonstrated improvements in grasp quality and execution success in cluttered scenes. The approach offers practical impact by enabling reliable manipulation of transparent objects in real-world desktop environments, addressing both perception and planning under challenging optical conditions.

Abstract

Transparent object grasping remains a persistent challenge in robotics, largely due to the difficulty of acquiring precise 3D information. Conventional optical 3D sensors struggle to capture transparent objects, and machine learning methods are often hindered by their reliance on high-quality datasets. Leveraging NeRF's capability for continuous spatial opacity modeling, our proposed architecture integrates a NeRF-based approach for reconstructing the 3D information of transparent objects. Despite this, certain portions of the reconstructed 3D information may remain incomplete. To address these deficiencies, we introduce a shape-prior-driven completion mechanism, further refined by a geometric pose estimation method we have developed. This allows us to obtain a complete and reliable 3D information of transparent objects. Utilizing this refined data, we perform scene-level grasp prediction and deploy the results in real-world robotic systems. Experimental validation demonstrates the efficacy of our architecture, showcasing its capability to reliably capture 3D information of various transparent objects in cluttered scenes, and correspondingly, achieve high-quality, stables, and executable grasp predictions.

NeRF-Based Transparent Object Grasping Enhanced by Shape Priors

TL;DR

This work tackles the challenge of grasping transparent objects by combining NeRF-based panoramic scene reconstruction with a shape-prior–driven completion module and a pose-estimation step tailored for non-ideal geometries. The pipeline culminates in scene-level 6-DoF grasp predictions generated by a GraspNet-1Billion–based model, validated on real robotic hardware. Key contributions include a robust NeRF-based reconstruction for transparency, a dense shape completion approach using a pre-trained auto-decoder guided by shape priors, and demonstrated improvements in grasp quality and execution success in cluttered scenes. The approach offers practical impact by enabling reliable manipulation of transparent objects in real-world desktop environments, addressing both perception and planning under challenging optical conditions.

Abstract

Transparent object grasping remains a persistent challenge in robotics, largely due to the difficulty of acquiring precise 3D information. Conventional optical 3D sensors struggle to capture transparent objects, and machine learning methods are often hindered by their reliance on high-quality datasets. Leveraging NeRF's capability for continuous spatial opacity modeling, our proposed architecture integrates a NeRF-based approach for reconstructing the 3D information of transparent objects. Despite this, certain portions of the reconstructed 3D information may remain incomplete. To address these deficiencies, we introduce a shape-prior-driven completion mechanism, further refined by a geometric pose estimation method we have developed. This allows us to obtain a complete and reliable 3D information of transparent objects. Utilizing this refined data, we perform scene-level grasp prediction and deploy the results in real-world robotic systems. Experimental validation demonstrates the efficacy of our architecture, showcasing its capability to reliably capture 3D information of various transparent objects in cluttered scenes, and correspondingly, achieve high-quality, stables, and executable grasp predictions.

Paper Structure

This paper contains 16 sections, 2 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: The workflow of our proposed architecture. Starting with NeRF-based scene reconstruction, pose estimation is applied to incomplete transparent objects in the reconstructed results. Subsequently, shape completion is performed using an auto-decoder pre-trained with shape priors. Finally, scene-level grasp predictions for transparent objects are made, followed by validation through experiments on a real robotic system.
  • Figure 2: Pose estimation method for non-revolute symmetric objects, with a primary focus on identifying geometric key regions and extracting their orientation.
  • Figure 3: Scene point clouds obtained by various methods. Transparent object 3D data from the depth camera and COLMAPschoenberger2016sfmschoenberger2016mvs is severely missing and distorted, making grasp prediction impossible. In contrast, our method yields reliable 3D information for transparent objects.
  • Figure 4: Comparison of grasp predictions before and after shape completion, with higher prediction quality following completion.
  • Figure 5: Scene-level grasp prediction for transparent objects and deployment on a real robot.