Table of Contents
Fetching ...

Semi-supervised Single-view 3D Reconstruction via Multi Shape Prior Fusion Strategy and Self-Attention

Wei Zhoua, Xinzhe Shia, Yunfeng Shea, Kunlong Liua, Yongqin Zhanga

TL;DR

An innovative semi-supervised framework for 3D reconstruction that distinctively uniquely introduces a multi shape prior fusion strategy, intending to guide the creation of more realistic object structures is created.

Abstract

In the domain of single-view 3D reconstruction, traditional techniques have frequently relied on expensive and time-intensive 3D annotation data. Facing the challenge of annotation acquisition, semi-supervised learning strategies offer an innovative approach to reduce the dependence on labeled data. Despite these developments, the utilization of this learning paradigm in 3D reconstruction tasks remains relatively constrained. In this research, we created an innovative semi-supervised framework for 3D reconstruction that distinctively uniquely introduces a multi shape prior fusion strategy, intending to guide the creation of more realistic object structures. Additionally, to improve the quality of shape generation, we integrated a self-attention module into the traditional decoder. In benchmark tests on the ShapeNet dataset, our method substantially outperformed existing supervised learning methods at diverse labeled ratios of 1\%, 10\%, and 20\%. Moreover, it showcased excellent performance on the real-world Pix3D dataset. Through comprehensive experiments on ShapeNet, our framework demonstrated a 3.3\% performance improvement over the baseline. Moreover, stringent ablation studies further confirmed the notable effectiveness of our approach. Our code has been released on https://github.com/NWUzhouwei/SSMP

Semi-supervised Single-view 3D Reconstruction via Multi Shape Prior Fusion Strategy and Self-Attention

TL;DR

An innovative semi-supervised framework for 3D reconstruction that distinctively uniquely introduces a multi shape prior fusion strategy, intending to guide the creation of more realistic object structures is created.

Abstract

In the domain of single-view 3D reconstruction, traditional techniques have frequently relied on expensive and time-intensive 3D annotation data. Facing the challenge of annotation acquisition, semi-supervised learning strategies offer an innovative approach to reduce the dependence on labeled data. Despite these developments, the utilization of this learning paradigm in 3D reconstruction tasks remains relatively constrained. In this research, we created an innovative semi-supervised framework for 3D reconstruction that distinctively uniquely introduces a multi shape prior fusion strategy, intending to guide the creation of more realistic object structures. Additionally, to improve the quality of shape generation, we integrated a self-attention module into the traditional decoder. In benchmark tests on the ShapeNet dataset, our method substantially outperformed existing supervised learning methods at diverse labeled ratios of 1\%, 10\%, and 20\%. Moreover, it showcased excellent performance on the real-world Pix3D dataset. Through comprehensive experiments on ShapeNet, our framework demonstrated a 3.3\% performance improvement over the baseline. Moreover, stringent ablation studies further confirmed the notable effectiveness of our approach. Our code has been released on https://github.com/NWUzhouwei/SSMP

Paper Structure

This paper contains 13 sections, 20 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Comparison of reconstruction results using the initial spherical and fusion point clouds. In the initial spherical point cloud (a), there is a noticeable lack of detail, particularly in generating bookshelf layers. In contrast, the fusion point cloud (b) is derived through a multi shape prior strategy, which achieves a fusion point cloud by integrating multiple point clouds.
  • Figure 2: The differences between semi-supervised learning, supervised learning and unsupervised learning. (a) Illustration of supervised single-view 3D reconstruction, which requires a large amount of labeled data pairs.(b) Illustration of semi-supervised single-view 3D reconstruction, our proposed SSMP (Semi-Supervised Multi Shape Prior Fusion Reconstruction) model can predict the 3D shape of unlabeled images after training on a mix of a small amount of labeled and unlabeled data. (c) Unsupervised single-view 3D reconstruction requires a large amount of pose information.
  • Figure 3: SSMP consists of two stages. The warm-up stage: we train the 3D reconstruction network using the available supervised data and the fusion shape point clouds as the initial point clouds. The teacher-guided stage: we apply three sets of image-level augmentations for the unsupervised data and one set of feature-level augmentations. With fixed parameters, the teacher generates pseudo-labels to train the student using weakly augmented data. Meanwhile, the student learns knowledge by inputting two sets of strongly augmented data and one set of feature-level augmented data. The knowledge that the student learns online is gradually transferred to the teacher's weights in a replicated mode using an exponential moving average (EMA).
  • Figure 4: (a) SSP3D baseline. (b) Our proposed feature perturbations method (SSMP). "FP" denotes feature perturbation, the blue line indicates the unsupervised loss, and the red line indicates the supervised loss.
  • Figure 5: Examples of single-view 3D reconstruction on the Pix3D dataset using only 10% labeled data.
  • ...and 1 more figures