Table of Contents
Fetching ...

Learning to Infer Implicit Surfaces without 3D Supervision

Shichen Liu, Shunsuke Saito, Weikai Chen, Hao Li

TL;DR

The paper tackles unsupervised 3D shape learning by leveraging implicit surface representations trained solely from 2D silhouettes, addressing the lack of 3D supervision. It introduces a field probing framework with sparse 3D anchor points and rays to bridge the implicit occupancy field with image-space supervision, paired with a finite-difference-based geometric regularizer and importance weighting to enforce local smoothness near the surface. The approach achieves high-resolution, topology-flexible reconstructions from single-view cues, outperforming state-of-the-art explicit-representation baselines in 3D IoU and visual quality, and it is validated through extensive ablations. This work broadens the practical applicability of implicit surfaces for single-view 3D digitization and sets a foundation for further unsupervised, texture-aware shape modeling.

Abstract

Recent advances in 3D deep learning have shown that it is possible to train highly effective deep models for 3D shape generation, directly from 2D images. This is particularly interesting since the availability of 3D models is still limited compared to the massive amount of accessible 2D images, which is invaluable for training. The representation of 3D surfaces itself is a key factor for the quality and resolution of the 3D output. While explicit representations, such as point clouds and voxels, can span a wide range of shape variations, their resolutions are often limited. Mesh-based representations are more efficient but are limited by their ability to handle varying topologies. Implicit surfaces, however, can robustly handle complex shapes, topologies, and also provide flexible resolution control. We address the fundamental problem of learning implicit surfaces for shape inference without the need of 3D supervision. Despite their advantages, it remains nontrivial to (1) formulate a differentiable connection between implicit surfaces and their 2D renderings, which is needed for image-based supervision; and (2) ensure precise geometric properties and control, such as local smoothness. In particular, sampling implicit surfaces densely is also known to be a computationally demanding and very slow operation. To this end, we propose a novel ray-based field probing technique for efficient image-to-field supervision, as well as a general geometric regularizer for implicit surfaces, which provides natural shape priors in unconstrained regions. We demonstrate the effectiveness of our framework on the task of single-view image-based 3D shape digitization and show how we outperform state-of-the-art techniques both quantitatively and qualitatively.

Learning to Infer Implicit Surfaces without 3D Supervision

TL;DR

The paper tackles unsupervised 3D shape learning by leveraging implicit surface representations trained solely from 2D silhouettes, addressing the lack of 3D supervision. It introduces a field probing framework with sparse 3D anchor points and rays to bridge the implicit occupancy field with image-space supervision, paired with a finite-difference-based geometric regularizer and importance weighting to enforce local smoothness near the surface. The approach achieves high-resolution, topology-flexible reconstructions from single-view cues, outperforming state-of-the-art explicit-representation baselines in 3D IoU and visual quality, and it is validated through extensive ablations. This work broadens the practical applicability of implicit surfaces for single-view 3D digitization and sets a foundation for further unsupervised, texture-aware shape modeling.

Abstract

Recent advances in 3D deep learning have shown that it is possible to train highly effective deep models for 3D shape generation, directly from 2D images. This is particularly interesting since the availability of 3D models is still limited compared to the massive amount of accessible 2D images, which is invaluable for training. The representation of 3D surfaces itself is a key factor for the quality and resolution of the 3D output. While explicit representations, such as point clouds and voxels, can span a wide range of shape variations, their resolutions are often limited. Mesh-based representations are more efficient but are limited by their ability to handle varying topologies. Implicit surfaces, however, can robustly handle complex shapes, topologies, and also provide flexible resolution control. We address the fundamental problem of learning implicit surfaces for shape inference without the need of 3D supervision. Despite their advantages, it remains nontrivial to (1) formulate a differentiable connection between implicit surfaces and their 2D renderings, which is needed for image-based supervision; and (2) ensure precise geometric properties and control, such as local smoothness. In particular, sampling implicit surfaces densely is also known to be a computationally demanding and very slow operation. To this end, we propose a novel ray-based field probing technique for efficient image-to-field supervision, as well as a general geometric regularizer for implicit surfaces, which provides natural shape priors in unconstrained regions. We demonstrate the effectiveness of our framework on the task of single-view image-based 3D shape digitization and show how we outperform state-of-the-art techniques both quantitatively and qualitatively.

Paper Structure

This paper contains 20 sections, 6 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: While explicit shape representations may suffer from poor visual quality due to limited resolutions or fail to handle arbitrary topologies (a), implicit surfaces handle arbitrary topologies with high resolutions in a memory efficient manner (b). However, in contrast to the explicit representations, it is not feasible to directly project an implicit field onto a 2D domain via perspective transformation. Thus, we introduce a field probing approach based on efficient ray sampling that enables unsupervised learning of implicit surfaces from image-based supervision.
  • Figure 2: Ray-based field probing technique. (a) A sparse set of 3D anchor points are distributed to sense the field by sampling the occupancy value at its location. (b) Each anchor is assigned a spherical supporting region to enable ray-point intersection. The anchor points that have higher probability to stay inside the object surface are marked with deeper blue. (c) Rays are cast passing through the sampling points $\{\mathbf{x}_i\}$ on the 2D silhouette under the camera views $\{\pi_k\}$ (blue indicates object interior and white otherwise). (d) By aggregating the information from the intersected anchor points via max pooling, one can obtain the prediction for each ray. (e) The silhouette loss is obtained by comparing the prediction with the ground-truth label in the image space.
  • Figure 3: Network architecture for unsupervised learning of implicit surfaces. The input image $I$ is first mapped to a latent feature $\mathbf{z}$ by an image encoder $g$ while the implicit decoder $f$ consumes both the latent code $\mathbf{z}$ and a query point $\mathbf{p}_j$ and predicts its occupancy probability $\phi(\mathbf{p}_j)$. With a trained network, one can generate an implicit field whose iso-surface at 0.5 depicts the inferred geometry.
  • Figure 4: 2D illustration of importance weighted geometric regularization.
  • Figure 5: Qualitative results of single-view reconstruction using different surface representations. For point cloud representation, we also visualize the meshes reconstructed from the output point cloud.
  • ...and 4 more figures