Table of Contents
Fetching ...

Neural Mesh Fusion: Unsupervised 3D Planar Surface Understanding

Farhad G. Zanjani, Hong Cai, Yinhao Zhu, Leyla Mirvakhabova, Fatih Porikli

TL;DR

Neural Mesh Fusion is an efficient approach for joint optimization of polygon mesh from multi-view image observations and unsupervised 3D planar-surface parsing of the scene that is significantly more computationally efficient than implicit neural rendering-based scene reconstruction approaches.

Abstract

This paper presents Neural Mesh Fusion (NMF), an efficient approach for joint optimization of polygon mesh from multi-view image observations and unsupervised 3D planar-surface parsing of the scene. In contrast to implicit neural representations, NMF directly learns to deform surface triangle mesh and generate an embedding for unsupervised 3D planar segmentation through gradient-based optimization directly on the surface mesh. The conducted experiments show that NMF obtains competitive results compared to state-of-the-art multi-view planar reconstruction, while not requiring any ground-truth 3D or planar supervision. Moreover, NMF is significantly more computationally efficient compared to implicit neural rendering-based scene reconstruction approaches.

Neural Mesh Fusion: Unsupervised 3D Planar Surface Understanding

TL;DR

Neural Mesh Fusion is an efficient approach for joint optimization of polygon mesh from multi-view image observations and unsupervised 3D planar-surface parsing of the scene that is significantly more computationally efficient than implicit neural rendering-based scene reconstruction approaches.

Abstract

This paper presents Neural Mesh Fusion (NMF), an efficient approach for joint optimization of polygon mesh from multi-view image observations and unsupervised 3D planar-surface parsing of the scene. In contrast to implicit neural representations, NMF directly learns to deform surface triangle mesh and generate an embedding for unsupervised 3D planar segmentation through gradient-based optimization directly on the surface mesh. The conducted experiments show that NMF obtains competitive results compared to state-of-the-art multi-view planar reconstruction, while not requiring any ground-truth 3D or planar supervision. Moreover, NMF is significantly more computationally efficient compared to implicit neural rendering-based scene reconstruction approaches.
Paper Structure (9 sections, 9 equations, 8 figures, 2 tables)

This paper contains 9 sections, 9 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Overview of our proposed Neural Mesh Fusion pipeline. The predicted pseudo-depth and normal maps, obtained by the pre-trained networks, are used for 2D plane segmentation. Sampling pixels in planar regions results in a triangular mesh fragment that is transferred into 3D space. The collection of posed mesh fragments are fused together, guided by the explicit neural rendering process.
  • Figure 2: Camera views clustering and keyframes selection; (a) different clusters are shown with different colors, (b) examples of some images belong to three different clusters. Each cluster includes diverse 3D views of the objects in their FoV.
  • Figure 3: Mesh initialization; (left) input image; (middle) sampled pixels and triangular mesh of 2D planar segments; (right) initial mesh fragment shown from a different camera view.
  • Figure 4: Two examples of contrastive sampling. (left) two training images with some markers, which indicate a sampled pixel (green stripe triangle) and its positive pair (green stripe circle) and some negative pairs (red stripe circle). (middle) the computed planar distances (PD map), (right) normal maps in 3D world coordinates.
  • Figure 5: Visualization: (a) an input camera image, and its corresponding (b) rendered plane instance embedding, (c) radiance, and (d-e) depth and normal geometrical fields. As shown in (b), the spherical embedding vectors distinguish plane instances in the scene.
  • ...and 3 more figures