HIVE: HIerarchical Volume Encoding for Neural Implicit Surface Reconstruction
Xiaodong Gu, Weihao Yuan, Heng Li, Zilong Dong, Ping Tan
TL;DR
This work addresses the lack of explicit 3D structure in neural implicit surface reconstruction by introducing a hierarchical volume encoding that couples high-resolution spatial features with low-resolution context to enforce smoothness. The method embeds eight multi-scale feature volumes into the SDF and color networks, supplemented by sparse high-resolution volumes and two regularizers to improve detail while reducing memory usage. Through a three-stage training regime and extensive experiments on DTU, EPFL, and BlendedMVS, the approach yields notable improvements in Chamfer distance and normal consistency, and enhances novel-view synthesis compared to strong baselines. Overall, the hierarchical-volume module serves as a versatile plug-in that significantly boosts implicit surface reconstruction quality in a memory-efficient manner, enabling finer geometry without sacrificing global coherence.
Abstract
Neural implicit surface reconstruction has become a new trend in reconstructing a detailed 3D shape from images. In previous methods, however, the 3D scene is only encoded by the MLPs which do not have an explicit 3D structure. To better represent 3D shapes, we introduce a volume encoding to explicitly encode the spatial information. We further design hierarchical volumes to encode the scene structures in multiple scales. The high-resolution volumes capture the high-frequency geometry details since spatially varying features could be learned from different 3D points, while the low-resolution volumes enforce the spatial consistency to keep the shape smooth since adjacent locations possess the same low-resolution feature. In addition, we adopt a sparse structure to reduce the memory consumption at high-resolution volumes, and two regularization terms to enhance results smoothness. This hierarchical volume encoding could be appended to any implicit surface reconstruction method as a plug-and-play module, and can generate a smooth and clean reconstruction with more details. Superior performance is demonstrated in DTU, EPFL, and BlendedMVS datasets with significant improvement on the standard metrics.
