Table of Contents
Fetching ...

Surface-Centric Modeling for High-Fidelity Generalizable Neural Surface Reconstruction

Rui Peng, Shihe Shen, Kaiqiang Xiong, Huachen Gao, Jianbo Jiao, Xiaodong Gu, Ronggang Wang

TL;DR

This work proposes SuRF, a new Surface-centric framework that incorporates a new Region sparsification based on a matching Field, achieving good trade-offs between performance, efficiency and scalability and is the first unsupervised method achieving end-to-end sparsification powered by the introduced matching field.

Abstract

Reconstructing the high-fidelity surface from multi-view images, especially sparse images, is a critical and practical task that has attracted widespread attention in recent years. However, existing methods are impeded by the memory constraint or the requirement of ground-truth depths and cannot recover satisfactory geometric details. To this end, we propose SuRF, a new Surface-centric framework that incorporates a new Region sparsification based on a matching Field, achieving good trade-offs between performance, efficiency and scalability. To our knowledge, this is the first unsupervised method achieving end-to-end sparsification powered by the introduced matching field, which leverages the weight distribution to efficiently locate the boundary regions containing surface. Instead of predicting an SDF value for each voxel, we present a new region sparsification approach to sparse the volume by judging whether the voxel is inside the surface region. In this way, our model can exploit higher frequency features around the surface with less memory and computational consumption. Extensive experiments on multiple benchmarks containing complex large-scale scenes show that our reconstructions exhibit high-quality details and achieve new state-of-the-art performance, i.e., 46% improvements with 80% less memory consumption. Code is available at https://github.com/prstrive/SuRF.

Surface-Centric Modeling for High-Fidelity Generalizable Neural Surface Reconstruction

TL;DR

This work proposes SuRF, a new Surface-centric framework that incorporates a new Region sparsification based on a matching Field, achieving good trade-offs between performance, efficiency and scalability and is the first unsupervised method achieving end-to-end sparsification powered by the introduced matching field.

Abstract

Reconstructing the high-fidelity surface from multi-view images, especially sparse images, is a critical and practical task that has attracted widespread attention in recent years. However, existing methods are impeded by the memory constraint or the requirement of ground-truth depths and cannot recover satisfactory geometric details. To this end, we propose SuRF, a new Surface-centric framework that incorporates a new Region sparsification based on a matching Field, achieving good trade-offs between performance, efficiency and scalability. To our knowledge, this is the first unsupervised method achieving end-to-end sparsification powered by the introduced matching field, which leverages the weight distribution to efficiently locate the boundary regions containing surface. Instead of predicting an SDF value for each voxel, we present a new region sparsification approach to sparse the volume by judging whether the voxel is inside the surface region. In this way, our model can exploit higher frequency features around the surface with less memory and computational consumption. Extensive experiments on multiple benchmarks containing complex large-scale scenes show that our reconstructions exhibit high-quality details and achieve new state-of-the-art performance, i.e., 46% improvements with 80% less memory consumption. Code is available at https://github.com/prstrive/SuRF.
Paper Structure (19 sections, 17 equations, 16 figures, 10 tables)

This paper contains 19 sections, 17 equations, 16 figures, 10 tables.

Figures (16)

  • Figure 1: Pipeline comparison with existing methods. We omit the supervised depth-fusion methods ren2023volreconliang2023rethinking and only visualize two scales or stages here for convenience. (a) The multi-stage pipeline like SparseNeuS long2022sparseneus is not end-to-end and tends to accumulate errors, whose coarse stages cannot be optimized together with fine stages and can no longer be corrected. (b) To achieve end-to-end training, methods like GenS peng2023gens applied the multi-scale structure to concatenate the coarse and fine volumes together, but the memory constraints limit the volume resolution. (c) View-frustum based methods like C2F2NeuS xu2023c2f2neus construct a separate cost volume for each view, which consumes much memory and computation, especially when there are many input views. On the contrary, we design an end-to-end and sparse pipeline (d), which can leverage higher-resolution volumes with less memory and computational consumption, and the coarse model can be optimized together with the fine model.
  • Figure 2: Comparisons against recent state-of-the-art methods. All experiments were conducted under the same configuration, e.g., $600\times 800$ resolution and 512 rays. The reconstruction of our method is more accurate and detailed. While the memory consumption of other methods long2022sparseneusren2023volreconliang2023rethinkingxu2023c2f2neus increases exponentially with volume resolution, we can utilize higher-resolution feature volumes with smaller memory overhead to reconstruct higher-frequency details. Meanwhile, our method can directly extract meshes using Marching Cubes on the SDF like peng2023genslong2022sparseneus, whose consumption is more stable with vary input numbers, and is more efficient than depth-fusion methods ren2023volreconliang2023rethinking.
  • Figure 3: Framework of SuRF. The multi-scale features are extracted through an FPN network to generate the global volume through our cross-scale fusion strategy. We then build our multi-scale surface-centric feature volumes through the region sparsification, which is based on the surface region extracted from the matching field. We employ color blending to estimate the appearance of points sampled by the surface sampling, and adopt volume rendering to recover the color of a pixel. Here, we omit some modules, e.g., surface sampling and cross-scale fusion, for convenience.
  • Figure 4: Illustration of our matching field. We encode the rough scene geometry into a matching volume, and the surface position of a ray can be efficiently retrieved through interpolation. For convenience, we illustrate the surface map $E_0$ corresponding to all rays of image $I_0$ in the form of a depth map $d_0$. Then we leverage the warping loss $L_{wl}$ to constrain the matching field unsupervised.
  • Figure 5: Illustration of region sparsification and surface sampling. For sparsification, we illustrate three situations: voxels that fall into surface regions visible from multiple viewpoints (a) will be preserved; voxels that fall into surface regions only visible by one viewpoint (b) or are outside surface regions (c) will be pruned. The black rectangle represents the position of a voxel. For surface sampling, we sample the points for each ray within the surface region at each scale.
  • ...and 11 more figures