Table of Contents
Fetching ...

Leveraging Single-View Images for Unsupervised 3D Point Cloud Completion

Lintai Wu, Qijian Zhang, Junhui Hou, Yong Xu

TL;DR

This work tackles the challenge of completing incomplete 3D point clouds without relying on 3D ground-truth supervision. It presents Cross-PCC, a two-stage framework that fuses 3D partial-point-cloud features with single-view 2D image features to form a global representation, followed by a view-assisted refinement that uses a silhouette-based calibrator and a DGCNN-based offset predictor. Training relies on a projection-based loss that compares 2D projections to foreground silhouette samples across views, together with a partial input preservation loss, enabling effective learning from 2D supervision alone. Experiments on synthetic and real data show Cross-PCC substantially outperforms prior unsupervised methods and approaches the performance of several supervised methods, demonstrating strong practical potential for 3D reconstruction in settings with limited 3D supervision.

Abstract

Point clouds captured by scanning devices are often incomplete due to occlusion. To overcome this limitation, point cloud completion methods have been developed to predict the complete shape of an object based on its partial input. These methods can be broadly classified as supervised or unsupervised. However, both categories require a large number of 3D complete point clouds, which may be difficult to capture. In this paper, we propose Cross-PCC, an unsupervised point cloud completion method without requiring any 3D complete point clouds. We only utilize 2D images of the complete objects, which are easier to capture than 3D complete and clean point clouds. Specifically, to take advantage of the complementary information from 2D images, we use a single-view RGB image to extract 2D features and design a fusion module to fuse the 2D and 3D features extracted from the partial point cloud. To guide the shape of predicted point clouds, we project the predicted points of the object to the 2D plane and use the foreground pixels of its silhouette maps to constrain the position of the projected points. To reduce the outliers of the predicted point clouds, we propose a view calibrator to move the points projected to the background into the foreground by the single-view silhouette image. To the best of our knowledge, our approach is the first point cloud completion method that does not require any 3D supervision. The experimental results of our method are superior to those of the state-of-the-art unsupervised methods by a large margin. Moreover, our method even achieves comparable performance to some supervised methods. We will make the source code publicly available at https://github.com/ltwu6/cross-pcc.

Leveraging Single-View Images for Unsupervised 3D Point Cloud Completion

TL;DR

This work tackles the challenge of completing incomplete 3D point clouds without relying on 3D ground-truth supervision. It presents Cross-PCC, a two-stage framework that fuses 3D partial-point-cloud features with single-view 2D image features to form a global representation, followed by a view-assisted refinement that uses a silhouette-based calibrator and a DGCNN-based offset predictor. Training relies on a projection-based loss that compares 2D projections to foreground silhouette samples across views, together with a partial input preservation loss, enabling effective learning from 2D supervision alone. Experiments on synthetic and real data show Cross-PCC substantially outperforms prior unsupervised methods and approaches the performance of several supervised methods, demonstrating strong practical potential for 3D reconstruction in settings with limited 3D supervision.

Abstract

Point clouds captured by scanning devices are often incomplete due to occlusion. To overcome this limitation, point cloud completion methods have been developed to predict the complete shape of an object based on its partial input. These methods can be broadly classified as supervised or unsupervised. However, both categories require a large number of 3D complete point clouds, which may be difficult to capture. In this paper, we propose Cross-PCC, an unsupervised point cloud completion method without requiring any 3D complete point clouds. We only utilize 2D images of the complete objects, which are easier to capture than 3D complete and clean point clouds. Specifically, to take advantage of the complementary information from 2D images, we use a single-view RGB image to extract 2D features and design a fusion module to fuse the 2D and 3D features extracted from the partial point cloud. To guide the shape of predicted point clouds, we project the predicted points of the object to the 2D plane and use the foreground pixels of its silhouette maps to constrain the position of the projected points. To reduce the outliers of the predicted point clouds, we propose a view calibrator to move the points projected to the background into the foreground by the single-view silhouette image. To the best of our knowledge, our approach is the first point cloud completion method that does not require any 3D supervision. The experimental results of our method are superior to those of the state-of-the-art unsupervised methods by a large margin. Moreover, our method even achieves comparable performance to some supervised methods. We will make the source code publicly available at https://github.com/ltwu6/cross-pcc.
Paper Structure (24 sections, 10 equations, 10 figures, 11 tables)

This paper contains 24 sections, 10 equations, 10 figures, 11 tables.

Figures (10)

  • Figure 1: Flowchart of the proposed unsupervised 3D point cloud completion framework named Cross-PCC. Cross-PCC comprises two stages, i.e., Coarse Shape Reconstruction (CSR) and View-assisted Shape Refinement (VSR). The CSR stage consists of a 3D encoder, a 2D encoder, a modality fusion module and a decoder. The VSR stage is composed of two view calibrators and an offset predictor.
  • Figure 2: Architecture of 2D encoder, 3D encoder and modality fusion module. "SAL" and "PT" denote Set Abstraction Layer in PointNet++ qi2017pointnet++ and Point Transformer 18zhao2021pointpan20213dguo2021pct.
  • Figure 3: Architecture of the decoder. "DC" denotes the "Deconvolution" operation. "C" in the circle means concatenation operation.
  • Figure 4: Illustration of the calibration operation. We project the predicted points in 3D space into the image plane and then replace the $x$ and $y$ coordinates of the outlier with those of its nearest boundary pixel. In the camera coordinate system in which all the points in the image plane are rays starting from the camera, this procedure means moving the outlier to the ray of its nearest boundary pixel.
  • Figure 5: Comparison between our rendered images and the existing rendered images. Note that all the images are shown at the same size. The images provided by 2DPM 20 are too unreal, and the rendered objects in the images of ViPC 6 are too small. In contrast, our renderer images are more real and the objects are larger so as to keep more details and reduce the background pixels.
  • ...and 5 more figures