NeVStereo: A NeRF-Driven NVS-Stereo Architecture for High-Fidelity 3D Tasks
Pengcheng Chen, Yue Hu, Wenhao Li, Nicole M Gunderson, Andrew Feng, Zhenglong Sun, Peter Beerel, Eric J Seibel
TL;DR
NeVStereo presents a NeRF-driven NVS-stereo framework that jointly recovers camera poses, multi-view depth, novel-view synthesis, and surface geometry from casual RGB multi-view inputs. It introduces a multi-view confidence-guided RGB-D optimization (Mv-CG) and a NeRF-coupled bundle adjustment with iterative supervision to enforce cross-view geometric consistency and reduce surface artifacts. Across indoor, outdoor, tabletop, and aerial data, it achieves state-of-the-art performance in pose estimation, depth accuracy, NVS fidelity, and mesh quality, with strong zero-shot generalization. The approach demonstrates that integrating NeRF-based NVS with robust depth voting, refinement, and TSDF fusion can surpass traditional SfM ceilings while mitigating common NeRF artifacts, albeit with reliance on initialization and challenges under sparse views.
Abstract
In modern dense 3D reconstruction, feed-forward systems (e.g., VGGT, pi3) focus on end-to-end matching and geometry prediction but do not explicitly output the novel view synthesis (NVS). Neural rendering-based approaches offer high-fidelity NVS and detailed geometry from posed images, yet they typically assume fixed camera poses and can be sensitive to pose errors. As a result, it remains non-trivial to obtain a single framework that can offer accurate poses, reliable depth, high-quality rendering, and accurate 3D surfaces from casually captured views. We present NeVStereo, a NeRF-driven NVS-stereo architecture that aims to jointly deliver camera poses, multi-view depth, novel view synthesis, and surface reconstruction from multi-view RGB-only inputs. NeVStereo combines NeRF-based NVS for stereo-friendly renderings, confidence-guided multi-view depth estimation, NeRF-coupled bundle adjustment for pose refinement, and an iterative refinement stage that updates both depth and the radiance field to improve geometric consistency. This design mitigated the common NeRF-based issues such as surface stacking, artifacts, and pose-depth coupling. Across indoor, outdoor, tabletop, and aerial benchmarks, our experiments indicate that NeVStereo achieves consistently strong zero-shot performance, with up to 36% lower depth error, 10.4% improved pose accuracy, 4.5% higher NVS fidelity, and state-of-the-art mesh quality (F1 91.93%, Chamfer 4.35 mm) compared to existing prestigious methods.
