Table of Contents
Fetching ...

ViiNeuS: Volumetric Initialization for Implicit Neural Surface reconstruction of urban scenes with limited image overlap

Hala Djeghim, Nathan Piasco, Moussab Bennehar, Luis Roldão, Dzmitry Tsishkou, Désiré Sidibé

TL;DR

ViiNeuS introduces a hybrid implicit-surface framework tailored for large-scale urban driving scenes with limited image overlap. By jointly modeling a volumetric density field and a signed distance field, and by a progressive volume-rendering scheme guided by self-supervised density estimation, it achieves fast convergence and high-fidelity reconstructions without heavy priors. Key contributions include the two-field architecture, probabilistic density-guided sampling, and regularization strategies that stabilize the hybrid stage, resulting in faster training (approximately half the time of prior methods) and improved surface accuracy across KITTI-360, Pandaset, Waymo, and nuScenes. The approach yields high-quality textured meshes suitable for downstream applications, while maintaining robustness to challenging urban geometries and limited-view data. Overall, ViiNeuS advances scalable, data-efficient 3D urban scene reconstruction with practical implications for autonomous driving research and related graphics tasks.

Abstract

Neural implicit surface representation methods have recently shown impressive 3D reconstruction results. However, existing solutions struggle to reconstruct driving scenes due to their large size, highly complex nature and their limited visual observation overlap. Hence, to achieve accurate reconstructions, additional supervision data such as LiDAR, strong geometric priors, and long training times are required. To tackle such limitations, we present ViiNeuS, a new hybrid implicit surface learning method that efficiently initializes the signed distance field to reconstruct large driving scenes from 2D street view images. ViiNeuS's hybrid architecture models two separate implicit fields: one representing the volumetric density of the scene, and another one representing the signed distance to the surface. To accurately reconstruct urban outdoor driving scenarios, we introduce a novel volume-rendering strategy that relies on self-supervised probabilistic density estimation to sample points near the surface and transition progressively from volumetric to surface representation. Our solution permits a proper and fast initialization of the signed distance field without relying on any geometric prior on the scene, compared to concurrent methods. By conducting extensive experiments on four outdoor driving datasets, we show that ViiNeuS can learn an accurate and detailed 3D surface representation of various urban scene while being two times faster to train compared to previous state-of-the-art solutions.

ViiNeuS: Volumetric Initialization for Implicit Neural Surface reconstruction of urban scenes with limited image overlap

TL;DR

ViiNeuS introduces a hybrid implicit-surface framework tailored for large-scale urban driving scenes with limited image overlap. By jointly modeling a volumetric density field and a signed distance field, and by a progressive volume-rendering scheme guided by self-supervised density estimation, it achieves fast convergence and high-fidelity reconstructions without heavy priors. Key contributions include the two-field architecture, probabilistic density-guided sampling, and regularization strategies that stabilize the hybrid stage, resulting in faster training (approximately half the time of prior methods) and improved surface accuracy across KITTI-360, Pandaset, Waymo, and nuScenes. The approach yields high-quality textured meshes suitable for downstream applications, while maintaining robustness to challenging urban geometries and limited-view data. Overall, ViiNeuS advances scalable, data-efficient 3D urban scene reconstruction with practical implications for autonomous driving research and related graphics tasks.

Abstract

Neural implicit surface representation methods have recently shown impressive 3D reconstruction results. However, existing solutions struggle to reconstruct driving scenes due to their large size, highly complex nature and their limited visual observation overlap. Hence, to achieve accurate reconstructions, additional supervision data such as LiDAR, strong geometric priors, and long training times are required. To tackle such limitations, we present ViiNeuS, a new hybrid implicit surface learning method that efficiently initializes the signed distance field to reconstruct large driving scenes from 2D street view images. ViiNeuS's hybrid architecture models two separate implicit fields: one representing the volumetric density of the scene, and another one representing the signed distance to the surface. To accurately reconstruct urban outdoor driving scenarios, we introduce a novel volume-rendering strategy that relies on self-supervised probabilistic density estimation to sample points near the surface and transition progressively from volumetric to surface representation. Our solution permits a proper and fast initialization of the signed distance field without relying on any geometric prior on the scene, compared to concurrent methods. By conducting extensive experiments on four outdoor driving datasets, we show that ViiNeuS can learn an accurate and detailed 3D surface representation of various urban scene while being two times faster to train compared to previous state-of-the-art solutions.
Paper Structure (38 sections, 10 equations, 13 figures, 5 tables)

This paper contains 38 sections, 10 equations, 13 figures, 5 tables.

Figures (13)

  • Figure 1: We introduce ViiNeuS, a novel SDF initialization method tailored to accurately reconstruct large-scale driving scenes from RGB images with limited overlaps. Extensive experiments on popular driving datasets show the superiority of ViiNeuS's mesh (left) over previous state-of-the-art methods such as StreetSurfstreetsurf (right). The figure presents both the initial mesh and the textured one as derived from both techniques.
  • Figure 2: ViiNeuS overview -- Our solution can be divided into two key components: an hybrid scene representation (Sec. \ref{['sec:architecture']}) and a ray-based volumetric rendering that progressively transitions from density to SDF sample alpha composition (Sec. \ref{['sec:SCILLA-vol-rendering']}).
  • Figure 3: ViiNeuS architecture -- Our method uses two MLP functions $\mathcal{F}_\Theta^h$ and $\mathcal{F}_\Theta^c$ to output SDF values and density ($\sigma$ and $f(x)$, respectively) along with color values given an input sample. We inspire from NeRF and design $\mathcal{F}_\Theta^c$ to output the color given the lattent vector $h$ outputted by $\mathcal{F}_\Theta^h$, the viewing direction ($d$) and the normal vector $\overrightarrow{n}$ obtained from the gradient of the SDF.
  • Figure 4: Qualitative experiments results on \ref{['fig:qualitative/kitti']} KITTI-360, \ref{['fig:qualitative/pandaset']} Pandaset, \ref{['fig:qualitative/nuscenes']} nuScenes and \ref{['fig:qualitative/waymo']} Waymo Open Dataset. We compare our mesh extracted from our SDF to GOF, COLMAP, OpenMVS and StreetSurf meshes.
  • Figure 5: Chamfer distance between LiDAR and predicted point cloud of Density field and SDF field at different training steps (left). P$\rightarrow$M for different hybrid stage duration at different training steps for Seq. 23 from Pandaset (right).
  • ...and 8 more figures