ViiNeuS: Volumetric Initialization for Implicit Neural Surface reconstruction of urban scenes with limited image overlap
Hala Djeghim, Nathan Piasco, Moussab Bennehar, Luis Roldão, Dzmitry Tsishkou, Désiré Sidibé
TL;DR
ViiNeuS introduces a hybrid implicit-surface framework tailored for large-scale urban driving scenes with limited image overlap. By jointly modeling a volumetric density field and a signed distance field, and by a progressive volume-rendering scheme guided by self-supervised density estimation, it achieves fast convergence and high-fidelity reconstructions without heavy priors. Key contributions include the two-field architecture, probabilistic density-guided sampling, and regularization strategies that stabilize the hybrid stage, resulting in faster training (approximately half the time of prior methods) and improved surface accuracy across KITTI-360, Pandaset, Waymo, and nuScenes. The approach yields high-quality textured meshes suitable for downstream applications, while maintaining robustness to challenging urban geometries and limited-view data. Overall, ViiNeuS advances scalable, data-efficient 3D urban scene reconstruction with practical implications for autonomous driving research and related graphics tasks.
Abstract
Neural implicit surface representation methods have recently shown impressive 3D reconstruction results. However, existing solutions struggle to reconstruct driving scenes due to their large size, highly complex nature and their limited visual observation overlap. Hence, to achieve accurate reconstructions, additional supervision data such as LiDAR, strong geometric priors, and long training times are required. To tackle such limitations, we present ViiNeuS, a new hybrid implicit surface learning method that efficiently initializes the signed distance field to reconstruct large driving scenes from 2D street view images. ViiNeuS's hybrid architecture models two separate implicit fields: one representing the volumetric density of the scene, and another one representing the signed distance to the surface. To accurately reconstruct urban outdoor driving scenarios, we introduce a novel volume-rendering strategy that relies on self-supervised probabilistic density estimation to sample points near the surface and transition progressively from volumetric to surface representation. Our solution permits a proper and fast initialization of the signed distance field without relying on any geometric prior on the scene, compared to concurrent methods. By conducting extensive experiments on four outdoor driving datasets, we show that ViiNeuS can learn an accurate and detailed 3D surface representation of various urban scene while being two times faster to train compared to previous state-of-the-art solutions.
