Volumetric Reconstruction From Partial Views for Task-Oriented Grasping
Fujian Yan, Hui Li, Hongsheng He
TL;DR
This work tackles task-oriented grasping for unseen objects using partial views. It introduces a Recurrent GAN (R-GAN) with an LSTM-based generator to reconstruct a full object volume $\hat{Y}$ from limited depth scans, paired with a 3D-CNN discriminator, and trained on synthetic data. Object affordances are incorporated via the AffordPose dataset, retrieving a candidate grasp by volume similarity using Chamfer Distance $d_{\text{CD}}$ and a category-level metric $d' = \min_{O_i \in c} d_{\text{CD}}(O_r,O_i)$, followed by PPO-based reinforcement learning to refine the grasp. The approach yields strong reconstruction metrics (IoU, HR, accuracy) and an overall $89\%$ success rate across four task categories on a dual-arm robot, demonstrating robust task-oriented grasping under partial views.
Abstract
Object affordance and volumetric information are essential in devising effective grasping strategies under task-specific constraints. This paper presents an approach for inferring suitable grasping strategies from limited partial views of an object. To achieve this, a recurrent generative adversarial network (R-GAN) was proposed by incorporating a recurrent generator with long short-term memory (LSTM) units for it to process a variable number of depth scans. To determine object affordances, the AffordPose knowledge dataset is utilized as prior knowledge. Affordance retrieving is defined by the volume similarity measured via Chamfer Distance and action similarities. A Proximal Policy Optimization (PPO) reinforcement learning model is further implemented to refine the retrieved grasp strategies for task-oriented grasping. The retrieved grasp strategies were evaluated on a dual-arm mobile manipulation robot with an overall grasping accuracy of 89% for four tasks: lift, handle grasp, wrap grasp, and press.
