Table of Contents
Fetching ...

GeoTransfer : Generalizable Few-Shot Multi-View Reconstruction via Transfer Learning

Shubhendu Jena, Franck Multon, Adnane Boukhayma

TL;DR

A novel approach for sparse 3D reconstruction by leveraging the expressive power of Neural Radiance Fields (NeRFs) and fast transfer of their features to learn accurate occupancy fields and introduces a novel loss on volumetric rendering weights that helps in the learning of accurate occupancy fields.

Abstract

This paper presents a novel approach for sparse 3D reconstruction by leveraging the expressive power of Neural Radiance Fields (NeRFs) and fast transfer of their features to learn accurate occupancy fields. Existing 3D reconstruction methods from sparse inputs still struggle with capturing intricate geometric details and can suffer from limitations in handling occluded regions. On the other hand, NeRFs excel in modeling complex scenes but do not offer means to extract meaningful geometry. Our proposed method offers the best of both worlds by transferring the information encoded in NeRF features to derive an accurate occupancy field representation. We utilize a pre-trained, generalizable state-of-the-art NeRF network to capture detailed scene radiance information, and rapidly transfer this knowledge to train a generalizable implicit occupancy network. This process helps in leveraging the knowledge of the scene geometry encoded in the generalizable NeRF prior and refining it to learn occupancy fields, facilitating a more precise generalizable representation of 3D space. The transfer learning approach leads to a dramatic reduction in training time, by orders of magnitude (i.e. from several days to 3.5 hrs), obviating the need to train generalizable sparse surface reconstruction methods from scratch. Additionally, we introduce a novel loss on volumetric rendering weights that helps in the learning of accurate occupancy fields, along with a normal loss that helps in global smoothing of the occupancy fields. We evaluate our approach on the DTU dataset and demonstrate state-of-the-art performance in terms of reconstruction accuracy, especially in challenging scenarios with sparse input data and occluded regions. We furthermore demonstrate the generalization capabilities of our method by showing qualitative results on the Blended MVS dataset without any retraining.

GeoTransfer : Generalizable Few-Shot Multi-View Reconstruction via Transfer Learning

TL;DR

A novel approach for sparse 3D reconstruction by leveraging the expressive power of Neural Radiance Fields (NeRFs) and fast transfer of their features to learn accurate occupancy fields and introduces a novel loss on volumetric rendering weights that helps in the learning of accurate occupancy fields.

Abstract

This paper presents a novel approach for sparse 3D reconstruction by leveraging the expressive power of Neural Radiance Fields (NeRFs) and fast transfer of their features to learn accurate occupancy fields. Existing 3D reconstruction methods from sparse inputs still struggle with capturing intricate geometric details and can suffer from limitations in handling occluded regions. On the other hand, NeRFs excel in modeling complex scenes but do not offer means to extract meaningful geometry. Our proposed method offers the best of both worlds by transferring the information encoded in NeRF features to derive an accurate occupancy field representation. We utilize a pre-trained, generalizable state-of-the-art NeRF network to capture detailed scene radiance information, and rapidly transfer this knowledge to train a generalizable implicit occupancy network. This process helps in leveraging the knowledge of the scene geometry encoded in the generalizable NeRF prior and refining it to learn occupancy fields, facilitating a more precise generalizable representation of 3D space. The transfer learning approach leads to a dramatic reduction in training time, by orders of magnitude (i.e. from several days to 3.5 hrs), obviating the need to train generalizable sparse surface reconstruction methods from scratch. Additionally, we introduce a novel loss on volumetric rendering weights that helps in the learning of accurate occupancy fields, along with a normal loss that helps in global smoothing of the occupancy fields. We evaluate our approach on the DTU dataset and demonstrate state-of-the-art performance in terms of reconstruction accuracy, especially in challenging scenarios with sparse input data and occluded regions. We furthermore demonstrate the generalization capabilities of our method by showing qualitative results on the Blended MVS dataset without any retraining.
Paper Structure (28 sections, 14 equations, 9 figures, 8 tables)

This paper contains 28 sections, 14 equations, 9 figures, 8 tables.

Figures (9)

  • Figure 1: Overview of our transfer learning: Our final model (in light blue) is comprised of tuned image encoder $\Psi$ and implicit occupancy and color decoders $f_o$ and $g_{\bold{c}}$. The encoder $\Psi$, the color decoder $g_{\bold{c}}$ and density decoder $g_{\sigma}$ are initialized as a pretrained generalizable NeRF. Red dashed lines symbolize our tuning losses. We apply multiple regularizations on our occupancy $f_o$, while tuning the network with both the density and occupancy guided volumetric renderings.
  • Figure 1: MVSNeRF chen2021mvsnerf (in red) and MVSTransfer (in blue) qualitative evaluation on DTU aanaes2016large using $3$ source images. Notice that our meshes (in blue) are closer to the ground truth meshes (in gray) than MVSNeRF chen2021mvsnerf (in red)
  • Figure 2: Qualitative comparison of reconstructions from 3 input views in datatset DTU.
  • Figure 2: Novel-View synthesis qualitative evaluation on DTU aanaes2016large using $3$ source images.
  • Figure 3: Qualitative comparison of reconstructions from 3 input views in datatset BMVS. Note that we reconstruct detailed surfaces with our method without any fine-tuning.
  • ...and 4 more figures