UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction
Michael Oechsle, Songyou Peng, Andreas Geiger
TL;DR
UNISURF introduces a unified neural framework that blends implicit surface representations with radiance-field volume rendering to reconstruct accurate geometry from multi-view images without object masks. By parameterizing the surface as an occupancy field and coupling surface and volume rendering within a single model, it bootstraps geometry with broad volume sampling and progressively refines surfaces, achieving high-quality reconstructions comparable to mask-supervised methods like IDR and outperforming NeRF on geometry. The approach is validated on DTU, BlendedMVS, and SceneNet, and through ablations demonstrates the necessity of jointly optimizing surface and volume components as well as the adaptive sampling schedule. This yields a practical, mask-free path for high-fidelity 3D reconstruction and fast surface extraction, with implications for scalable, multi-view scene understanding.
Abstract
Neural implicit 3D representations have emerged as a powerful paradigm for reconstructing surfaces from multi-view images and synthesizing novel views. Unfortunately, existing methods such as DVR or IDR require accurate per-pixel object masks as supervision. At the same time, neural radiance fields have revolutionized novel view synthesis. However, NeRF's estimated volume density does not admit accurate surface reconstruction. Our key insight is that implicit surface models and radiance fields can be formulated in a unified way, enabling both surface and volume rendering using the same model. This unified perspective enables novel, more efficient sampling procedures and the ability to reconstruct accurate surfaces without input masks. We compare our method on the DTU, BlendedMVS, and a synthetic indoor dataset. Our experiments demonstrate that we outperform NeRF in terms of reconstruction quality while performing on par with IDR without requiring masks.
