UniSDF: Unifying Neural Representations for High-Fidelity 3D Reconstruction of Complex Scenes with Reflections

Fangjinhua Wang; Marie-Julie Rakotosaona; Michael Niemeyer; Richard Szeliski; Marc Pollefeys; Federico Tombari

UniSDF: Unifying Neural Representations for High-Fidelity 3D Reconstruction of Complex Scenes with Reflections

Fangjinhua Wang, Marie-Julie Rakotosaona, Michael Niemeyer, Richard Szeliski, Marc Pollefeys, Federico Tombari

TL;DR

This work proposes UniSDF, a general purpose 3D reconstruction method that can reconstruct large complex scenes with reflections and investigates both camera view as well as reflected view-based color parameterization techniques and finds that explicitly blending these representations in 3D space enables reconstruction of surfaces that are more geometrically accurate, especially for reflective surfaces.

Abstract

Neural 3D scene representations have shown great potential for 3D reconstruction from 2D images. However, reconstructing real-world captures of complex scenes still remains a challenge. Existing generic 3D reconstruction methods often struggle to represent fine geometric details and do not adequately model reflective surfaces of large-scale scenes. Techniques that explicitly focus on reflective surfaces can model complex and detailed reflections by exploiting better reflection parameterizations. However, we observe that these methods are often not robust in real scenarios where non-reflective as well as reflective components are present. In this work, we propose UniSDF, a general purpose 3D reconstruction method that can reconstruct large complex scenes with reflections. We investigate both camera view as well as reflected view-based color parameterization techniques and find that explicitly blending these representations in 3D space enables reconstruction of surfaces that are more geometrically accurate, especially for reflective surfaces. We further combine this representation with a multi-resolution grid backbone that is trained in a coarse-to-fine manner, enabling faster reconstructions than prior methods. Extensive experiments on object-level datasets DTU, Shiny Blender as well as unbounded datasets Mip-NeRF 360 and Ref-NeRF real demonstrate that our method is able to robustly reconstruct complex large-scale scenes with fine details and reflective surfaces, leading to the best overall performance. Project page: \url{https://fangjinhuawang.github.io/UniSDF}.

UniSDF: Unifying Neural Representations for High-Fidelity 3D Reconstruction of Complex Scenes with Reflections

TL;DR

Abstract

Paper Structure (53 sections, 12 equations, 14 figures, 8 tables)

This paper contains 53 sections, 12 equations, 14 figures, 8 tables.

Introduction
Related Works
Multi-view stereo (MVS).
Neural radiance fields (NeRF).
NeRFs for reflections.
Method
NeRF Preliminaries
UniSDF
Volume rendering the SDF.
Hash Encoding with iNGP.
Camera View & Reflected View Radiance Fields.
Learned composition.
Motivation of composing radiance fields.
Training and Regularization
Coarse-to-fine training.
...and 38 more sections

Figures (14)

Figure 1: Comparison of surface normals (top) and RGB renderings (bottom) on "garden spheres" verbin2022refnerf. While the state-of-the-art methods Ref-NeRF verbin2022refnerf, ENVIDR liang2023envidr, and Neuralangelo li2023neuralangelo struggle to reconstruct reflective elements or fine geometric details, our method accurately models both, leading to high-quality mesh reconstructions of all parts of the scene. Best viewed when zoomed in.
Figure 2: Pipeline of UniSDF. We combine the camera view radiance field and reflected view radiance field in 3D. Given a position $\mathbf{x}$, we extract iNGP features $\gamma$ and input them to an MLP $f$ that estimates a signed distance value $d$ used to compute the NeRF density. We parametrize the camera view and reflected view radiance fields with two different MLPs $f_{cam}$ and $f_{ref}$ respectively. Finally, we learn a continuous weight field that is used to compute the final color as a weighted composite $\mathbf{W}$ of the radiance fields colors $\mathbf{C}_{cam}$ and $\mathbf{C}_{ref}$ after volume rendering, Eq. \ref{['eq:composition']}.
Figure 3: Visualization of the color of reflected view radiance field, color of camera view radiance field, learned weight $\mathbf{W}$, composed color and surface normal on "sedan" and "garden spheres" scenes verbin2022refnerf. Our method assigns high weight (red color) for reflective surfaces, e.g., window and hood of sedan, spheres, without any supervision.
Figure 4: Qualitative comparison with BakedSDF yariv2023bakedsdf on "bicycle" and "officebonsai" scenes of Mip-NeRF 360 dataset mipnerf360. BakedSDF produces hole structures in many regions (highlighted with dotted orange boxes) and less details of fine structures (highlighted with red boxes), while our method reconstructs more complete surfaces and better details. Best viewed when zoomed in.
Figure 5: Qualitative comparison of surface normals with two baselines, RefV and CamV on "sedan" and "toycar" scenes verbin2022refnerf. Best viewed when zoomed in.
...and 9 more figures

UniSDF: Unifying Neural Representations for High-Fidelity 3D Reconstruction of Complex Scenes with Reflections

TL;DR

Abstract

UniSDF: Unifying Neural Representations for High-Fidelity 3D Reconstruction of Complex Scenes with Reflections

Authors

TL;DR

Abstract

Table of Contents

Figures (14)