Table of Contents
Fetching ...

Fast Point Cloud to Mesh Reconstruction for Deformable Object Tracking

Elham Amin Mansour, Hehui Zheng, Robert K. Katzschmann

TL;DR

This work developed a method that takes as input a template mesh which is the mesh of an object in its non-deformed state and a deformed point cloud of the same object, and then shapes the template mesh such that it matches the deformed point cloud.

Abstract

The world around us is full of soft objects we perceive and deform with dexterous hand movements. For a robotic hand to control soft objects, it has to acquire online state feedback of the deforming object. While RGB-D cameras can collect occluded point clouds at a rate of 30Hz, this does not represent a continuously trackable object surface. Hence, in this work, we developed a method that takes as input a template mesh which is the mesh of an object in its non-deformed state and a deformed point cloud of the same object, and then shapes the template mesh such that it matches the deformed point cloud. The reconstruction of meshes from point clouds has long been studied in the field of Computer graphics under 3D reconstruction and 4D reconstruction, however, both lack the speed and generalizability needed for robotics applications. Our model is designed using a point cloud auto-encoder and a Real-NVP architecture. Our trained model can perform mesh reconstruction and tracking at a rate of 58Hz on a template mesh of 3000 vertices and a deformed point cloud of 5000 points and is generalizable to the deformations of six different object categories which are assumed to be made of soft material in our experiments (scissors, hammer, foam brick, cleanser bottle, orange, and dice). The object meshes are taken from the YCB benchmark dataset. An instance of a downstream application can be the control algorithm for a robotic hand that requires online feedback from the state of the manipulated object which would allow online grasp adaptation in a closed-loop manner. Furthermore, the tracking capacity of our method can help in the system identification of deforming objects in a marker-free approach. In future work, we will extend our trained model to generalize beyond six object categories and additionally to real-world deforming point clouds.

Fast Point Cloud to Mesh Reconstruction for Deformable Object Tracking

TL;DR

This work developed a method that takes as input a template mesh which is the mesh of an object in its non-deformed state and a deformed point cloud of the same object, and then shapes the template mesh such that it matches the deformed point cloud.

Abstract

The world around us is full of soft objects we perceive and deform with dexterous hand movements. For a robotic hand to control soft objects, it has to acquire online state feedback of the deforming object. While RGB-D cameras can collect occluded point clouds at a rate of 30Hz, this does not represent a continuously trackable object surface. Hence, in this work, we developed a method that takes as input a template mesh which is the mesh of an object in its non-deformed state and a deformed point cloud of the same object, and then shapes the template mesh such that it matches the deformed point cloud. The reconstruction of meshes from point clouds has long been studied in the field of Computer graphics under 3D reconstruction and 4D reconstruction, however, both lack the speed and generalizability needed for robotics applications. Our model is designed using a point cloud auto-encoder and a Real-NVP architecture. Our trained model can perform mesh reconstruction and tracking at a rate of 58Hz on a template mesh of 3000 vertices and a deformed point cloud of 5000 points and is generalizable to the deformations of six different object categories which are assumed to be made of soft material in our experiments (scissors, hammer, foam brick, cleanser bottle, orange, and dice). The object meshes are taken from the YCB benchmark dataset. An instance of a downstream application can be the control algorithm for a robotic hand that requires online feedback from the state of the manipulated object which would allow online grasp adaptation in a closed-loop manner. Furthermore, the tracking capacity of our method can help in the system identification of deforming objects in a marker-free approach. In future work, we will extend our trained model to generalize beyond six object categories and additionally to real-world deforming point clouds.
Paper Structure (20 sections, 1 equation, 16 figures, 5 tables)

This paper contains 20 sections, 1 equation, 16 figures, 5 tables.

Figures (16)

  • Figure 1: Our pipeline for the two stages of training (a) and testing (b), respectively, both take as input a deformed point cloud of an object and a mesh of the same object in a non-deformed state (template mesh). It is designed to reconstruct the deformed point cloud into a deformed mesh by deforming the template mesh. (a) Training stage: The auto-encoder, comprised of an encoder and decoder, takes the deformed point cloud as input and learns an encoding through chamfer loss by comparing the decoded/reconstructed deformed point cloud with the groundtruth deformed point cloud. Then, the conditional Real-NVP model takes as input the auto-encoder's encoding and the template mesh and learns the coordinates of the deformed mesh using chamfer loss supervised by the ground truth deformed mesh. (b) Inference stage: The encoder encodes the deformed point cloud, and then the conditional Real-NVP model takes the template mesh and the encoding as input and predicts the new coordinate for every vertex in the template mesh. Therefore, in both stages of training and inference, the deformed mesh consists of the template mesh vertices moved around by the Real-NVP, and faces consistent with those of the template mesh.
  • Figure 2: The overall architecture of our method contains an auto-encoder (Top) and a conditional Real-NVP (Middle). The main goal of the model is to deform the mesh of a non-deformed object (template mesh) such that it fits the deformed point cloud of the same object. The auto-encoder takes the coordinates of a deformed point cloud and encodes it into an encoding with one-by-one convolutions and pooling. The Real-NVP which consists of coupling blocks takes the template mesh and the auto-encoder's encoding as input. The architecture within each coupling block (purple block) is shown on the bottom. Within the coupling block, one randomly chosen dimension of the coordinates is masked to zero (randomly chosen for each coupling block) and then the projections of all coordinates to 128 dimensions (brown block) are concatenated to the encoding from the auto-encoder (orange block). This concatenation is shown with a yellow accolade where the encoding is repeated for all coordinates. Subsequently, sequential Conv1d networks called $map_{s}$ (pink block) and $map_{t}$ (dark blue block), with and without an activation function, are applied to the results. The chosen dimension of the coordinates from the pink block is exponentiated and then multiplied by the corresponding dimension of the template mesh. The result which is the lemon green block is multiplied by the corresponding dimension of the $map_{t}$ network (dark blue block). The result (bright blue block) replaces the chosen dimension of the template mesh while the two other dimensions are kept as they were before. This result is passed on to the next coupling block as the new template and hence the template mesh coordinates are gradually modified to fit the deformed point cloud.
  • Figure 3: The generation of our different datasets for training and evaluating the model : (a) An arrow in a unique direction represents a unique warping field generated by the deformation code of the Occflow occflow authors. If a warping field is repeatedly applied to a mesh, then a trajectory is created. (b) Dataset B: The red arrows correspond to unseen deformations and the green arrows correspond to seen deformations during training. One trajectory containing 51 deformations is simulated for each of the six instances from the YCB benchmark dataset ycb where steps divisible by five are unseen and the steps indivisible by five are seen. (c) Dataset D: The red arrows correspond to unseen deformations and the green arrows correspond to seen deformations during training. 1000 trajectories are simulated for each of the six YCB instances where 800 of the trajectories are seen and 200 trajectories are unseen.
  • Figure 4: Comparison of different pre-trained encoders trained on 8-10 Shapenet categories(containing no deformations) which contain a car category but not a donut category: (a) Groundtruth mesh of the deformed donut at step 50 (b) Groundtruth mesh of deformed car at step 27 (b) Decoded donut point cloud using FoldingNet foldingnet (d) Decoded car point cloud using FoldingNet foldingnet (e) Decoded donut point cloud using GRNET xie2020grnet (f) Decoded car point cloud using GRNET xie2020grnet (e) Decoded donut using PCN yuan2018pcn (h) Decoded car point cloud using PCN yuan2018pcn.
  • Figure 5: Some visual results of experiments eight and ten(a) Template mesh (b) Groundtruth deformed mesh (c) Predicted mesh (d) Overlay of predicted and groundtruth deformed mesh. Every column corresponds to one deformation.
  • ...and 11 more figures