Table of Contents
Fetching ...

Voxurf: Voxel-based Efficient and Accurate Neural Surface Reconstruction

Tong Wu, Jiaqi Wang, Xingang Pan, Xudong Xu, Christian Theobalt, Ziwei Liu, Dahua Lin

TL;DR

Voxurf tackles the slow and noisy geometry reconstruction of fully implicit neural surfaces by introducing a voxel-based explicit representation that leverages a two-stage training procedure. It combines a dual color network to preserve color-geometry dependency with a hierarchical geometry feature to enable cross-voxel information propagation, supplemented by smoothness priors. The approach achieves ~20x faster training than state-of-the-art implicit methods while delivering high-fidelity geometry and novel-view synthesis on DTU and BlendedMVS. Ablation studies validate the contributions of the dual color network and hierarchical geometry feature, and the two-stage training is shown to be critical for stable, coherent reconstruction. This work demonstrates that carefully designed explicit voxel representations can rival implicit methods in quality while dramatically reducing training time.

Abstract

Neural surface reconstruction aims to reconstruct accurate 3D surfaces based on multi-view images. Previous methods based on neural volume rendering mostly train a fully implicit model with MLPs, which typically require hours of training for a single scene. Recent efforts explore the explicit volumetric representation to accelerate the optimization via memorizing significant information with learnable voxel grids. However, existing voxel-based methods often struggle in reconstructing fine-grained geometry, even when combined with an SDF-based volume rendering scheme. We reveal that this is because 1) the voxel grids tend to break the color-geometry dependency that facilitates fine-geometry learning, and 2) the under-constrained voxel grids lack spatial coherence and are vulnerable to local minima. In this work, we present Voxurf, a voxel-based surface reconstruction approach that is both efficient and accurate. Voxurf addresses the aforementioned issues via several key designs, including 1) a two-stage training procedure that attains a coherent coarse shape and recovers fine details successively, 2) a dual color network that maintains color-geometry dependency, and 3) a hierarchical geometry feature to encourage information propagation across voxels. Extensive experiments show that Voxurf achieves high efficiency and high quality at the same time. On the DTU benchmark, Voxurf achieves higher reconstruction quality with a 20x training speedup compared to previous fully implicit methods. Our code is available at https://github.com/wutong16/Voxurf.

Voxurf: Voxel-based Efficient and Accurate Neural Surface Reconstruction

TL;DR

Voxurf tackles the slow and noisy geometry reconstruction of fully implicit neural surfaces by introducing a voxel-based explicit representation that leverages a two-stage training procedure. It combines a dual color network to preserve color-geometry dependency with a hierarchical geometry feature to enable cross-voxel information propagation, supplemented by smoothness priors. The approach achieves ~20x faster training than state-of-the-art implicit methods while delivering high-fidelity geometry and novel-view synthesis on DTU and BlendedMVS. Ablation studies validate the contributions of the dual color network and hierarchical geometry feature, and the two-stage training is shown to be critical for stable, coherent reconstruction. This work demonstrates that carefully designed explicit voxel representations can rival implicit methods in quality while dramatically reducing training time.

Abstract

Neural surface reconstruction aims to reconstruct accurate 3D surfaces based on multi-view images. Previous methods based on neural volume rendering mostly train a fully implicit model with MLPs, which typically require hours of training for a single scene. Recent efforts explore the explicit volumetric representation to accelerate the optimization via memorizing significant information with learnable voxel grids. However, existing voxel-based methods often struggle in reconstructing fine-grained geometry, even when combined with an SDF-based volume rendering scheme. We reveal that this is because 1) the voxel grids tend to break the color-geometry dependency that facilitates fine-geometry learning, and 2) the under-constrained voxel grids lack spatial coherence and are vulnerable to local minima. In this work, we present Voxurf, a voxel-based surface reconstruction approach that is both efficient and accurate. Voxurf addresses the aforementioned issues via several key designs, including 1) a two-stage training procedure that attains a coherent coarse shape and recovers fine details successively, 2) a dual color network that maintains color-geometry dependency, and 3) a hierarchical geometry feature to encourage information propagation across voxels. Extensive experiments show that Voxurf achieves high efficiency and high quality at the same time. On the DTU benchmark, Voxurf achieves higher reconstruction quality with a 20x training speedup compared to previous fully implicit methods. Our code is available at https://github.com/wutong16/Voxurf.
Paper Structure (30 sections, 12 equations, 21 figures, 9 tables)

This paper contains 30 sections, 12 equations, 21 figures, 9 tables.

Figures (21)

  • Figure 1: Comparisons among different methods for surface reconstruction and novel view synthesis.(a) DVGO (v2) sun2021directsun2022improved benefits from the fastest convergence but suffers from a poor surface; (b) NeuS wang2021neus produces decent surfaces after a long training time, while high-frequency details are lost in both the geometry and the image; (c) the straightforward combination of DVGO and NeuS produces continuous but noisy surfaces; (d) our method achieves around 20x speedup than NeuS and recovers high-quality surfaces and images with fine details. All the training times are tested on a single Nvidia A100 GPU.
  • Figure 2: Reconstruction results from different architecture designs. The surface normal$n$ and learnable feature$f$ are both optional inputs to the color network. We show results of two cases under four settings on the left, and we zoom in to analyze the surfaces, normal fields, and feature fields on the right. Case (1) (a, c) and (b, d) show that the feature $f$ helps maintain a coherent shape, while case (2) (b, d) reveal that it discourages the reconstruction of geometry details since it disturbs the color-geometry dependency built by the normal $n$.
  • Figure 3: Overview of key components in our model. We adopt an explicit volumetric representation with an SDF voxel grid $V^{(sdf)}$ and a feature voxel grid $V^{(feat)}$. In the middle, we show the design for our dual color network, where $f^{feat}_i$ is the interpolated feature from $V^{(feat)}$ at point $p_i$, and $f^{geo}_i$ denotes the hierarchical feature constructed on the right. Here we show the multi-level sampling scheme and the region of grids that is affected by one point during optimization with different settings of levels.
  • Figure 4: Qualitative comparisons on the DTU dataset. See more scenes in supplementary materials.
  • Figure 5: Qualitative comparisons on the BlendedMVS dataset. See more scenes in supplementary materials.
  • ...and 16 more figures