Table of Contents
Fetching ...

Volumetric Reconstruction From Partial Views for Task-Oriented Grasping

Fujian Yan, Hui Li, Hongsheng He

TL;DR

This work tackles task-oriented grasping for unseen objects using partial views. It introduces a Recurrent GAN (R-GAN) with an LSTM-based generator to reconstruct a full object volume $\hat{Y}$ from limited depth scans, paired with a 3D-CNN discriminator, and trained on synthetic data. Object affordances are incorporated via the AffordPose dataset, retrieving a candidate grasp by volume similarity using Chamfer Distance $d_{\text{CD}}$ and a category-level metric $d' = \min_{O_i \in c} d_{\text{CD}}(O_r,O_i)$, followed by PPO-based reinforcement learning to refine the grasp. The approach yields strong reconstruction metrics (IoU, HR, accuracy) and an overall $89\%$ success rate across four task categories on a dual-arm robot, demonstrating robust task-oriented grasping under partial views.

Abstract

Object affordance and volumetric information are essential in devising effective grasping strategies under task-specific constraints. This paper presents an approach for inferring suitable grasping strategies from limited partial views of an object. To achieve this, a recurrent generative adversarial network (R-GAN) was proposed by incorporating a recurrent generator with long short-term memory (LSTM) units for it to process a variable number of depth scans. To determine object affordances, the AffordPose knowledge dataset is utilized as prior knowledge. Affordance retrieving is defined by the volume similarity measured via Chamfer Distance and action similarities. A Proximal Policy Optimization (PPO) reinforcement learning model is further implemented to refine the retrieved grasp strategies for task-oriented grasping. The retrieved grasp strategies were evaluated on a dual-arm mobile manipulation robot with an overall grasping accuracy of 89% for four tasks: lift, handle grasp, wrap grasp, and press.

Volumetric Reconstruction From Partial Views for Task-Oriented Grasping

TL;DR

This work tackles task-oriented grasping for unseen objects using partial views. It introduces a Recurrent GAN (R-GAN) with an LSTM-based generator to reconstruct a full object volume from limited depth scans, paired with a 3D-CNN discriminator, and trained on synthetic data. Object affordances are incorporated via the AffordPose dataset, retrieving a candidate grasp by volume similarity using Chamfer Distance and a category-level metric , followed by PPO-based reinforcement learning to refine the grasp. The approach yields strong reconstruction metrics (IoU, HR, accuracy) and an overall success rate across four task categories on a dual-arm robot, demonstrating robust task-oriented grasping under partial views.

Abstract

Object affordance and volumetric information are essential in devising effective grasping strategies under task-specific constraints. This paper presents an approach for inferring suitable grasping strategies from limited partial views of an object. To achieve this, a recurrent generative adversarial network (R-GAN) was proposed by incorporating a recurrent generator with long short-term memory (LSTM) units for it to process a variable number of depth scans. To determine object affordances, the AffordPose knowledge dataset is utilized as prior knowledge. Affordance retrieving is defined by the volume similarity measured via Chamfer Distance and action similarities. A Proximal Policy Optimization (PPO) reinforcement learning model is further implemented to refine the retrieved grasp strategies for task-oriented grasping. The retrieved grasp strategies were evaluated on a dual-arm mobile manipulation robot with an overall grasping accuracy of 89% for four tasks: lift, handle grasp, wrap grasp, and press.

Paper Structure

This paper contains 8 sections, 6 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: Task-oriented grasping based on volumetric reconstruction and affordance knowledge.
  • Figure 2: Part-level affordances are annotated, and the corresponding grasping strategy is then matched to each affordance.
  • Figure 3: Volumetric reconstruction results of AffordPose dataset objects.
  • Figure 4: Volumetric reconstruction results of objects in the standard dataset.
  • Figure 5: Task-oriented grasping. The first row displays the most volume-similar object from the knowledge base along with its corresponding grasp strategy. The second row shows the grasping results for the reconstructed object.
  • ...and 1 more figures