VF-NeRF: Learning Neural Vector Fields for Indoor Scene Reconstruction
Albert Gassol Puigjaner, Edoardo Mello Rella, Erik Sandström, Ajad Chhatkuli, Luc Van Gool
TL;DR
VF-NeRF presents a novel neural implicit representation for indoor scene reconstruction by learning a Vector Field (VF) that points toward the nearest surface. The method transforms VF into a differentiable surface density and combines it with volume rendering in a dual-MLP architecture to recover geometry and appearance from multi-view images. A hierarchical ray sampling strategy and a sliding-window density smoothing enable efficient, accurate reconstruction of large planar regions and sharp corners, with depth cues further boosting geometry accuracy. Experimental results on Replica and ScanNet show state-of-the-art performance in 3D reconstruction metrics and competitive novel-view synthesis, validating VF-NeRF’s strong inductive bias toward planar indoor structures. The work highlights a practical approach to indoor scene modeling that gracefully handles low-texture areas while preserving high-frequency details in rendered views.
Abstract
Implicit surfaces via neural radiance fields (NeRF) have shown surprising accuracy in surface reconstruction. Despite their success in reconstructing richly textured surfaces, existing methods struggle with planar regions with weak textures, which account for the majority of indoor scenes. In this paper, we address indoor dense surface reconstruction by revisiting key aspects of NeRF in order to use the recently proposed Vector Field (VF) as the implicit representation. VF is defined by the unit vector directed to the nearest surface point. It therefore flips direction at the surface and equals to the explicit surface normals. Except for this flip, VF remains constant along planar surfaces and provides a strong inductive bias in representing planar surfaces. Concretely, we develop a novel density-VF relationship and a training scheme that allows us to learn VF via volume rendering By doing this, VF-NeRF can model large planar surfaces and sharp corners accurately. We show that, when depth cues are available, our method further improves and achieves state-of-the-art results in reconstructing indoor scenes and rendering novel views. We extensively evaluate VF-NeRF on indoor datasets and run ablations of its components.
