Table of Contents
Fetching ...

UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction

Michael Oechsle, Songyou Peng, Andreas Geiger

TL;DR

UNISURF introduces a unified neural framework that blends implicit surface representations with radiance-field volume rendering to reconstruct accurate geometry from multi-view images without object masks. By parameterizing the surface as an occupancy field and coupling surface and volume rendering within a single model, it bootstraps geometry with broad volume sampling and progressively refines surfaces, achieving high-quality reconstructions comparable to mask-supervised methods like IDR and outperforming NeRF on geometry. The approach is validated on DTU, BlendedMVS, and SceneNet, and through ablations demonstrates the necessity of jointly optimizing surface and volume components as well as the adaptive sampling schedule. This yields a practical, mask-free path for high-fidelity 3D reconstruction and fast surface extraction, with implications for scalable, multi-view scene understanding.

Abstract

Neural implicit 3D representations have emerged as a powerful paradigm for reconstructing surfaces from multi-view images and synthesizing novel views. Unfortunately, existing methods such as DVR or IDR require accurate per-pixel object masks as supervision. At the same time, neural radiance fields have revolutionized novel view synthesis. However, NeRF's estimated volume density does not admit accurate surface reconstruction. Our key insight is that implicit surface models and radiance fields can be formulated in a unified way, enabling both surface and volume rendering using the same model. This unified perspective enables novel, more efficient sampling procedures and the ability to reconstruct accurate surfaces without input masks. We compare our method on the DTU, BlendedMVS, and a synthetic indoor dataset. Our experiments demonstrate that we outperform NeRF in terms of reconstruction quality while performing on par with IDR without requiring masks.

UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction

TL;DR

UNISURF introduces a unified neural framework that blends implicit surface representations with radiance-field volume rendering to reconstruct accurate geometry from multi-view images without object masks. By parameterizing the surface as an occupancy field and coupling surface and volume rendering within a single model, it bootstraps geometry with broad volume sampling and progressively refines surfaces, achieving high-quality reconstructions comparable to mask-supervised methods like IDR and outperforming NeRF on geometry. The approach is validated on DTU, BlendedMVS, and SceneNet, and through ablations demonstrates the necessity of jointly optimizing surface and volume components as well as the adaptive sampling schedule. This yields a practical, mask-free path for high-fidelity 3D reconstruction and fast surface extraction, with implications for scalable, multi-view scene understanding.

Abstract

Neural implicit 3D representations have emerged as a powerful paradigm for reconstructing surfaces from multi-view images and synthesizing novel views. Unfortunately, existing methods such as DVR or IDR require accurate per-pixel object masks as supervision. At the same time, neural radiance fields have revolutionized novel view synthesis. However, NeRF's estimated volume density does not admit accurate surface reconstruction. Our key insight is that implicit surface models and radiance fields can be formulated in a unified way, enabling both surface and volume rendering using the same model. This unified perspective enables novel, more efficient sampling procedures and the ability to reconstruct accurate surfaces without input masks. We compare our method on the DTU, BlendedMVS, and a synthetic indoor dataset. Our experiments demonstrate that we outperform NeRF in terms of reconstruction quality while performing on par with IDR without requiring masks.

Paper Structure

This paper contains 16 sections, 11 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: Illustration. Implicit models based on surface rendering Yariv2020ARXIVNiemeyer2020CVPR require input masks and radiance fields Mildenhall2020ECCV do not optimize implicit surfaces directly. UNISURF provides a principled unified formulation, enabling accurate surface reconstruction from images without input masks.
  • Figure 2: Surface Rendering (IDR Results). State-of-the-art methods like IDR Yariv2020ARXIV require object masks and careful initialization for capturing accurate geometry.
  • Figure 3: Volume Rendering (NeRF Results). We show level sets of the recovered density volume for a trained NeRF model Mildenhall2020ECCV using different density thresholds $\sigma$.
  • Figure 4: Concept and Notation. Our rendering consists of two steps: First, we seek the surface $\mathbf{x}_s$ (green) in the occupancy field $o_\theta$. Second, we define an interval around the surface to sample points $\{\mathbf{x}_i\}$ (red) for volume rendering.
  • Figure 5: Volume vs. Surface Rendering. We compare images generated using volume rendering (VR, $\Delta = \Delta_\text{min}$) and surface rendering (SR), reporting three image metrics (PSNR$\uparrow$/SSIM$\uparrow$/LPIPS$\downarrow$). Both approaches yield similar results while surface rendering is twice as fast.
  • ...and 3 more figures