Table of Contents
Fetching ...

Spatially-Adaptive Hash Encodings For Neural Surface Reconstruction

Thomas Walker, Octave Mariotti, Amir Vaxman, Hakan Bilen

TL;DR

The work tackles neural surface reconstruction by addressing the rigidity of fixed encodings. It introduces a spatially-adaptive hash encoding that splits hash grids into an SDF feature grid and a separate spatial mask grid, with a learned mask per location forming the final encoding $\mathbf{h}(\mathbf{x}) = [s_1(\mathbf{x}) \cdot \mathbf{f}_1, \dots, s_N(\mathbf{x}) \cdot \mathbf{f}_N]$, enabling context-dependent use of grid resolutions. Progressive unveiling of higher-resolution grids and joint optimization with eikonal and curvature regularization yield robust fitting with high-frequency detail where needed. Experiments on DTU and Tanks & Temples show state-of-the-art surface reconstructions, while ablations confirm the benefits of the spatial mask design and regularizers, highlighting the practical impact for accurate, detail-rich neural surfaces.

Abstract

Positional encodings are a common component of neural scene reconstruction methods, and provide a way to bias the learning of neural fields towards coarser or finer representations. Current neural surface reconstruction methods use a "one-size-fits-all" approach to encoding, choosing a fixed set of encoding functions, and therefore bias, across all scenes. Current state-of-the-art surface reconstruction approaches leverage grid-based multi-resolution hash encoding in order to recover high-detail geometry. We propose a learned approach which allows the network to choose its encoding basis as a function of space, by masking the contribution of features stored at separate grid resolutions. The resulting spatially adaptive approach allows the network to fit a wider range of frequencies without introducing noise. We test our approach on standard benchmark surface reconstruction datasets and achieve state-of-the-art performance on two benchmark datasets.

Spatially-Adaptive Hash Encodings For Neural Surface Reconstruction

TL;DR

The work tackles neural surface reconstruction by addressing the rigidity of fixed encodings. It introduces a spatially-adaptive hash encoding that splits hash grids into an SDF feature grid and a separate spatial mask grid, with a learned mask per location forming the final encoding , enabling context-dependent use of grid resolutions. Progressive unveiling of higher-resolution grids and joint optimization with eikonal and curvature regularization yield robust fitting with high-frequency detail where needed. Experiments on DTU and Tanks & Temples show state-of-the-art surface reconstructions, while ablations confirm the benefits of the spatial mask design and regularizers, highlighting the practical impact for accurate, detail-rich neural surfaces.

Abstract

Positional encodings are a common component of neural scene reconstruction methods, and provide a way to bias the learning of neural fields towards coarser or finer representations. Current neural surface reconstruction methods use a "one-size-fits-all" approach to encoding, choosing a fixed set of encoding functions, and therefore bias, across all scenes. Current state-of-the-art surface reconstruction approaches leverage grid-based multi-resolution hash encoding in order to recover high-detail geometry. We propose a learned approach which allows the network to choose its encoding basis as a function of space, by masking the contribution of features stored at separate grid resolutions. The resulting spatially adaptive approach allows the network to fit a wider range of frequencies without introducing noise. We test our approach on standard benchmark surface reconstruction datasets and achieve state-of-the-art performance on two benchmark datasets.

Paper Structure

This paper contains 20 sections, 8 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: In order to modulate typical hash grid features fields (top row), we propose to jointly learn a scalar field for each grid resolution (bottom row). Combining them yields a spatially-adaptive encoding which allows fine features in high-detail regions (microphone tip) while preserving lower frequency embeddings on smoother regions (microphone body).
  • Figure 2: Qualitative DTU results. Our approach improves surface details as well as surface accuracy of coarser components.
  • Figure 3: Qualitative Tanks and Temples results. Our approach produces improved surface accuracy, attaining generally cleaner surfaces with improved fine details.
  • Figure 4: Renderings of spatial mask heat maps. Red indicates the corresponding spatial masks take the value 1. Blue indicates the mask values are 0 and hence features stored on the corresponding grid resolutions are ignored in the final encoding.
  • Figure 5: Close-up comparison of details in scene 37. Left: spatial mask network with coarse hash-grid. Right: spatial mask network with default $[d_{min}, d_{max}] = [5, 11]$ grid resolutions. When using the coarsest mask hash-grid, the sharp features of the scissor blade are lost.
  • ...and 2 more figures