Table of Contents
Fetching ...

A Point Set Generation Network for 3D Object Reconstruction from a Single Image

Haoqiang Fan, Hao Su, Leonidas Guibas

TL;DR

This paper introduces a point-set based approach to 3D object reconstruction from a single image, addressing ground-truth ambiguity by employing a conditional shape sampler that generates multiple plausible 3D point clouds. It emphasizes output in the form of point clouds to preserve geometric invariances and proposes a novel architecture and loss framework tailored to sets of 3D points. Empirically, the method outperforms prior single-image 3D reconstruction techniques and demonstrates strong performance on 3D shape completion and the ability to produce multiple plausible reconstructions. The approach offers a practical pathway for flexible 3D reasoning from monocular inputs, with potential benefits for downstream tasks needing diverse 3D hypotheses.

Abstract

Generation of 3D data by deep neural network has been attracting increasing attention in the research community. The majority of extant works resort to regular representations such as volumetric grids or collection of images; however, these representations obscure the natural invariance of 3D shapes under geometric transformations and also suffer from a number of other issues. In this paper we address the problem of 3D reconstruction from a single image, generating a straight-forward form of output -- point cloud coordinates. Along with this problem arises a unique and interesting issue, that the groundtruth shape for an input image may be ambiguous. Driven by this unorthodox output form and the inherent ambiguity in groundtruth, we design architecture, loss function and learning paradigm that are novel and effective. Our final solution is a conditional shape sampler, capable of predicting multiple plausible 3D point clouds from an input image. In experiments not only can our system outperform state-of-the-art methods on single image based 3d reconstruction benchmarks; but it also shows a strong performance for 3d shape completion and promising ability in making multiple plausible predictions.

A Point Set Generation Network for 3D Object Reconstruction from a Single Image

TL;DR

This paper introduces a point-set based approach to 3D object reconstruction from a single image, addressing ground-truth ambiguity by employing a conditional shape sampler that generates multiple plausible 3D point clouds. It emphasizes output in the form of point clouds to preserve geometric invariances and proposes a novel architecture and loss framework tailored to sets of 3D points. Empirically, the method outperforms prior single-image 3D reconstruction techniques and demonstrates strong performance on 3D shape completion and the ability to produce multiple plausible reconstructions. The approach offers a practical pathway for flexible 3D reasoning from monocular inputs, with potential benefits for downstream tasks needing diverse 3D hypotheses.

Abstract

Generation of 3D data by deep neural network has been attracting increasing attention in the research community. The majority of extant works resort to regular representations such as volumetric grids or collection of images; however, these representations obscure the natural invariance of 3D shapes under geometric transformations and also suffer from a number of other issues. In this paper we address the problem of 3D reconstruction from a single image, generating a straight-forward form of output -- point cloud coordinates. Along with this problem arises a unique and interesting issue, that the groundtruth shape for an input image may be ambiguous. Driven by this unorthodox output form and the inherent ambiguity in groundtruth, we design architecture, loss function and learning paradigm that are novel and effective. Our final solution is a conditional shape sampler, capable of predicting multiple plausible 3D point clouds from an input image. In experiments not only can our system outperform state-of-the-art methods on single image based 3d reconstruction benchmarks; but it also shows a strong performance for 3d shape completion and promising ability in making multiple plausible predictions.

Paper Structure

This paper contains 16 sections, 2 figures, 1 table.

Figures (2)

  • Figure 1: Example of caption. It is set in Roman so that mathematics (always set in Roman: $B \sin A = A \sin B$) may be included without an ugly clash.
  • Figure 2: Example of a short caption, which should be centered.