Voxurf: Voxel-based Efficient and Accurate Neural Surface Reconstruction
Tong Wu, Jiaqi Wang, Xingang Pan, Xudong Xu, Christian Theobalt, Ziwei Liu, Dahua Lin
TL;DR
Voxurf tackles the slow and noisy geometry reconstruction of fully implicit neural surfaces by introducing a voxel-based explicit representation that leverages a two-stage training procedure. It combines a dual color network to preserve color-geometry dependency with a hierarchical geometry feature to enable cross-voxel information propagation, supplemented by smoothness priors. The approach achieves ~20x faster training than state-of-the-art implicit methods while delivering high-fidelity geometry and novel-view synthesis on DTU and BlendedMVS. Ablation studies validate the contributions of the dual color network and hierarchical geometry feature, and the two-stage training is shown to be critical for stable, coherent reconstruction. This work demonstrates that carefully designed explicit voxel representations can rival implicit methods in quality while dramatically reducing training time.
Abstract
Neural surface reconstruction aims to reconstruct accurate 3D surfaces based on multi-view images. Previous methods based on neural volume rendering mostly train a fully implicit model with MLPs, which typically require hours of training for a single scene. Recent efforts explore the explicit volumetric representation to accelerate the optimization via memorizing significant information with learnable voxel grids. However, existing voxel-based methods often struggle in reconstructing fine-grained geometry, even when combined with an SDF-based volume rendering scheme. We reveal that this is because 1) the voxel grids tend to break the color-geometry dependency that facilitates fine-geometry learning, and 2) the under-constrained voxel grids lack spatial coherence and are vulnerable to local minima. In this work, we present Voxurf, a voxel-based surface reconstruction approach that is both efficient and accurate. Voxurf addresses the aforementioned issues via several key designs, including 1) a two-stage training procedure that attains a coherent coarse shape and recovers fine details successively, 2) a dual color network that maintains color-geometry dependency, and 3) a hierarchical geometry feature to encourage information propagation across voxels. Extensive experiments show that Voxurf achieves high efficiency and high quality at the same time. On the DTU benchmark, Voxurf achieves higher reconstruction quality with a 20x training speedup compared to previous fully implicit methods. Our code is available at https://github.com/wutong16/Voxurf.
