Table of Contents
Fetching ...

Improving Neural Surface Reconstruction with Feature Priors from Multi-View Image

Xinlin Ren, Chenjie Cao, Yanwei Fu, Xiangyang Xue

TL;DR

The paper tackles the limitations of photo-consistency in Neural Surface Reconstruction (NSR) by integrating feature priors from seven pretext tasks across 13 models and introducing feature-level multi-view consistency losses. It demonstrates that priors from multi-view stereo (MVS) and image matching, especially at high feature resolutions, deliver the most significant NSR improvements, and that patch-wise feature consistency extends effectively to the feature domain, enabling variants like MVS-NeuS and Match-NeuS. The work provides a comprehensive evaluation on DTU and EPFL and shows that patch-based feature losses outperform pixel-based losses, while extending to grid-based representations such as Neuralangelo. Overall, the approach sets new state-of-the-art results on standard benchmarks and offers insights into which pretext priors and feature scales are most beneficial for robust NSR across different representations.

Abstract

Recent advancements in Neural Surface Reconstruction (NSR) have significantly improved multi-view reconstruction when coupled with volume rendering. However, relying solely on photometric consistency in image space falls short of addressing complexities posed by real-world data, including occlusions and non-Lambertian surfaces. To tackle these challenges, we propose an investigation into feature-level consistent loss, aiming to harness valuable feature priors from diverse pretext visual tasks and overcome current limitations. It is crucial to note the existing gap in determining the most effective pretext visual task for enhancing NSR. In this study, we comprehensively explore multi-view feature priors from seven pretext visual tasks, comprising thirteen methods. Our main goal is to strengthen NSR training by considering a wide range of possibilities. Additionally, we examine the impact of varying feature resolutions and evaluate both pixel-wise and patch-wise consistent losses, providing insights into effective strategies for improving NSR performance. By incorporating pre-trained representations from MVSFormer and QuadTree, our approach can generate variations of MVS-NeuS and Match-NeuS, respectively. Our results, analyzed on DTU and EPFL datasets, reveal that feature priors from image matching and multi-view stereo outperform other pretext tasks. Moreover, we discover that extending patch-wise photometric consistency to the feature level surpasses the performance of pixel-wise approaches. These findings underscore the effectiveness of these techniques in enhancing NSR outcomes.

Improving Neural Surface Reconstruction with Feature Priors from Multi-View Image

TL;DR

The paper tackles the limitations of photo-consistency in Neural Surface Reconstruction (NSR) by integrating feature priors from seven pretext tasks across 13 models and introducing feature-level multi-view consistency losses. It demonstrates that priors from multi-view stereo (MVS) and image matching, especially at high feature resolutions, deliver the most significant NSR improvements, and that patch-wise feature consistency extends effectively to the feature domain, enabling variants like MVS-NeuS and Match-NeuS. The work provides a comprehensive evaluation on DTU and EPFL and shows that patch-based feature losses outperform pixel-based losses, while extending to grid-based representations such as Neuralangelo. Overall, the approach sets new state-of-the-art results on standard benchmarks and offers insights into which pretext priors and feature scales are most beneficial for robust NSR across different representations.

Abstract

Recent advancements in Neural Surface Reconstruction (NSR) have significantly improved multi-view reconstruction when coupled with volume rendering. However, relying solely on photometric consistency in image space falls short of addressing complexities posed by real-world data, including occlusions and non-Lambertian surfaces. To tackle these challenges, we propose an investigation into feature-level consistent loss, aiming to harness valuable feature priors from diverse pretext visual tasks and overcome current limitations. It is crucial to note the existing gap in determining the most effective pretext visual task for enhancing NSR. In this study, we comprehensively explore multi-view feature priors from seven pretext visual tasks, comprising thirteen methods. Our main goal is to strengthen NSR training by considering a wide range of possibilities. Additionally, we examine the impact of varying feature resolutions and evaluate both pixel-wise and patch-wise consistent losses, providing insights into effective strategies for improving NSR performance. By incorporating pre-trained representations from MVSFormer and QuadTree, our approach can generate variations of MVS-NeuS and Match-NeuS, respectively. Our results, analyzed on DTU and EPFL datasets, reveal that feature priors from image matching and multi-view stereo outperform other pretext tasks. Moreover, we discover that extending patch-wise photometric consistency to the feature level surpasses the performance of pixel-wise approaches. These findings underscore the effectiveness of these techniques in enhancing NSR outcomes.
Paper Structure (11 sections, 9 equations, 8 figures, 4 tables)

This paper contains 11 sections, 9 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: (a) Pixel-wise similarity for RGB images: MVS features cao2022mvsformer and matching features tang2022quadtree on non-Lambertian surfaces. (b) Multi-view consistency of feature priors: Given a ray direction, the geometry network can extract surface points, which are used to project and align source features to the reference view for optimizing patch-wise photometric consistency loss.
  • Figure 2: Approach overview. Our method is based on NeuS wang2021neus. After achieving surface points from geometry network $f_{\theta_g}$, we further apply the multi-view consistent loss based on features from pre-trained models to improve the reconstruction quality.
  • Figure 3: Two different ways to locate surface points during the network training at 5000 iterations. Compared with volume rendering, it is more accurate to find the surface points via Eq. \ref{['eq:sdf_zero_level']}. Accurate surface points are crucial for applying consistent loss.
  • Figure 4: Quantitative results of using prior features from different pretext tasks for two scenarios: (a) with the lowest resolution features (high-level features), and (b) with the highest resolution features (low-level features). The red lines indicate the performance of NeuS, with lower values indicating better performance in terms of Chamfer Distance. High-resolution features generally outperform the low-resolution ones.
  • Figure 5: Qualitative results of different source view selecting strategies based on pixel similarity and patch similarity consistent losses in the Herzjesu of EPFL dataset.
  • ...and 3 more figures