Table of Contents
Fetching ...

HIVE: HIerarchical Volume Encoding for Neural Implicit Surface Reconstruction

Xiaodong Gu, Weihao Yuan, Heng Li, Zilong Dong, Ping Tan

TL;DR

This work addresses the lack of explicit 3D structure in neural implicit surface reconstruction by introducing a hierarchical volume encoding that couples high-resolution spatial features with low-resolution context to enforce smoothness. The method embeds eight multi-scale feature volumes into the SDF and color networks, supplemented by sparse high-resolution volumes and two regularizers to improve detail while reducing memory usage. Through a three-stage training regime and extensive experiments on DTU, EPFL, and BlendedMVS, the approach yields notable improvements in Chamfer distance and normal consistency, and enhances novel-view synthesis compared to strong baselines. Overall, the hierarchical-volume module serves as a versatile plug-in that significantly boosts implicit surface reconstruction quality in a memory-efficient manner, enabling finer geometry without sacrificing global coherence.

Abstract

Neural implicit surface reconstruction has become a new trend in reconstructing a detailed 3D shape from images. In previous methods, however, the 3D scene is only encoded by the MLPs which do not have an explicit 3D structure. To better represent 3D shapes, we introduce a volume encoding to explicitly encode the spatial information. We further design hierarchical volumes to encode the scene structures in multiple scales. The high-resolution volumes capture the high-frequency geometry details since spatially varying features could be learned from different 3D points, while the low-resolution volumes enforce the spatial consistency to keep the shape smooth since adjacent locations possess the same low-resolution feature. In addition, we adopt a sparse structure to reduce the memory consumption at high-resolution volumes, and two regularization terms to enhance results smoothness. This hierarchical volume encoding could be appended to any implicit surface reconstruction method as a plug-and-play module, and can generate a smooth and clean reconstruction with more details. Superior performance is demonstrated in DTU, EPFL, and BlendedMVS datasets with significant improvement on the standard metrics.

HIVE: HIerarchical Volume Encoding for Neural Implicit Surface Reconstruction

TL;DR

This work addresses the lack of explicit 3D structure in neural implicit surface reconstruction by introducing a hierarchical volume encoding that couples high-resolution spatial features with low-resolution context to enforce smoothness. The method embeds eight multi-scale feature volumes into the SDF and color networks, supplemented by sparse high-resolution volumes and two regularizers to improve detail while reducing memory usage. Through a three-stage training regime and extensive experiments on DTU, EPFL, and BlendedMVS, the approach yields notable improvements in Chamfer distance and normal consistency, and enhances novel-view synthesis compared to strong baselines. Overall, the hierarchical-volume module serves as a versatile plug-in that significantly boosts implicit surface reconstruction quality in a memory-efficient manner, enabling finer geometry without sacrificing global coherence.

Abstract

Neural implicit surface reconstruction has become a new trend in reconstructing a detailed 3D shape from images. In previous methods, however, the 3D scene is only encoded by the MLPs which do not have an explicit 3D structure. To better represent 3D shapes, we introduce a volume encoding to explicitly encode the spatial information. We further design hierarchical volumes to encode the scene structures in multiple scales. The high-resolution volumes capture the high-frequency geometry details since spatially varying features could be learned from different 3D points, while the low-resolution volumes enforce the spatial consistency to keep the shape smooth since adjacent locations possess the same low-resolution feature. In addition, we adopt a sparse structure to reduce the memory consumption at high-resolution volumes, and two regularization terms to enhance results smoothness. This hierarchical volume encoding could be appended to any implicit surface reconstruction method as a plug-and-play module, and can generate a smooth and clean reconstruction with more details. Superior performance is demonstrated in DTU, EPFL, and BlendedMVS datasets with significant improvement on the standard metrics.
Paper Structure (18 sections, 13 equations, 12 figures, 6 tables)

This paper contains 18 sections, 13 equations, 12 figures, 6 tables.

Figures (12)

  • Figure 1: Visualization of normal maps to highlight our advantages in recovering shape details.
  • Figure 2: Visual comparison of different encoding on DTU scan-24.
  • Figure 3: Method overview. In the first stage, we compute an initial result use features from volumes with resolution from $2$ to $256$. In a later stage, we finalize the result use features from sparsified high resolution volumes with a resolution of 512 or 1024.
  • Figure 4: A 2D toy example of the hierarchical volume encoding. Left: a single high resolution volume. Features with $3$ channels are used to encode two locations $p$ and $q$. Note it only captures spatial variant features. Right: a hierarchical volume with lower dimensionality. Features have just $1$ channel and the memory consumption is much less. The high resolution volume encodes spatial variant features, while the low resolution volume enforces spatial smoothness.
  • Figure 5: Sparse high-resolution volume. The index of $-1$ fetches the last embedding (in dark gray) in $T_e$.
  • ...and 7 more figures