Table of Contents
Fetching ...

SG-NeRF: Neural Surface Reconstruction with Scene Graph Optimization

Yiyang Chen, Siyan Dong, Xulong Wang, Lulu Cai, Youyi Zheng, Yanchao Yang

TL;DR

SG-NeRF addresses the challenge of 3D surface reconstruction with significantly noisy camera poses by jointly optimizing a neural radiance field and a scene graph initialized from SfM. It introduces adaptive inlier-outlier confidence, an IoU-based loss over matched keypoints, and a coarse-to-fine training strategy to mitigate outlier influence and stabilize optimization; the method minimizes a composite loss $L = L_{photo} + \alpha L_{reg} + \beta L_{IoU}$ with PSNR-informed confidence updates. The approach is validated on a newly collected dataset with challenging pose errors and on the DTU benchmark, where SG-NeRF achieves state-of-the-art or competitive performance, especially under severe pose noise. The work demonstrates robust, high-quality 3D reconstructions in practical scenarios and provides code and data to enable reproducibility and further research in pose-robust neural surface reconstruction.

Abstract

3D surface reconstruction from images is essential for numerous applications. Recently, Neural Radiance Fields (NeRFs) have emerged as a promising framework for 3D modeling. However, NeRFs require accurate camera poses as input, and existing methods struggle to handle significantly noisy pose estimates (i.e., outliers), which are commonly encountered in real-world scenarios. To tackle this challenge, we present a novel approach that optimizes radiance fields with scene graphs to mitigate the influence of outlier poses. Our method incorporates an adaptive inlier-outlier confidence estimation scheme based on scene graphs, emphasizing images of high compatibility with the neighborhood and consistency in the rendering quality. We also introduce an effective intersection-over-union (IoU) loss to optimize the camera pose and surface geometry, together with a coarse-to-fine strategy to facilitate the training. Furthermore, we propose a new dataset containing typical outlier poses for a detailed evaluation. Experimental results on various datasets consistently demonstrate the effectiveness and superiority of our method over existing approaches, showcasing its robustness in handling outliers and producing high-quality 3D reconstructions. Our code and data are available at: \url{https://github.com/Iris-cyy/SG-NeRF}.

SG-NeRF: Neural Surface Reconstruction with Scene Graph Optimization

TL;DR

SG-NeRF addresses the challenge of 3D surface reconstruction with significantly noisy camera poses by jointly optimizing a neural radiance field and a scene graph initialized from SfM. It introduces adaptive inlier-outlier confidence, an IoU-based loss over matched keypoints, and a coarse-to-fine training strategy to mitigate outlier influence and stabilize optimization; the method minimizes a composite loss with PSNR-informed confidence updates. The approach is validated on a newly collected dataset with challenging pose errors and on the DTU benchmark, where SG-NeRF achieves state-of-the-art or competitive performance, especially under severe pose noise. The work demonstrates robust, high-quality 3D reconstructions in practical scenarios and provides code and data to enable reproducibility and further research in pose-robust neural surface reconstruction.

Abstract

3D surface reconstruction from images is essential for numerous applications. Recently, Neural Radiance Fields (NeRFs) have emerged as a promising framework for 3D modeling. However, NeRFs require accurate camera poses as input, and existing methods struggle to handle significantly noisy pose estimates (i.e., outliers), which are commonly encountered in real-world scenarios. To tackle this challenge, we present a novel approach that optimizes radiance fields with scene graphs to mitigate the influence of outlier poses. Our method incorporates an adaptive inlier-outlier confidence estimation scheme based on scene graphs, emphasizing images of high compatibility with the neighborhood and consistency in the rendering quality. We also introduce an effective intersection-over-union (IoU) loss to optimize the camera pose and surface geometry, together with a coarse-to-fine strategy to facilitate the training. Furthermore, we propose a new dataset containing typical outlier poses for a detailed evaluation. Experimental results on various datasets consistently demonstrate the effectiveness and superiority of our method over existing approaches, showcasing its robustness in handling outliers and producing high-quality 3D reconstructions. Our code and data are available at: \url{https://github.com/Iris-cyy/SG-NeRF}.
Paper Structure (39 sections, 11 equations, 11 figures, 7 tables)

This paper contains 39 sections, 11 equations, 11 figures, 7 tables.

Figures (11)

  • Figure 1: 3D surface reconstruction (meshes) from images with camera poses that present significant noise. Directly training radiance fields with noisy poses can lead to incorrect structures (NeuS wang2021neus and Neuralangelo li2023neuralangelo). Recent approaches that focus on optimizing camera poses (BARF lin2021barf*, SCNeRF jeong2021self*, L2G-NeRF chen2023local*, and Joint-TensoRF cheng2024improving*, where * denotes their integration of NeuS for surface modeling) also fall short in handling significant pose errors, leading to unsatisfactory reconstruction. Our method works effectively and can produce high-quality 3D reconstructions.
  • Figure 2: An overview of the proposed joint learning pipeline. Given a set of images, we first apply a Structure-from-Motion (SfM) algorithm to construct an initial scene graph (left), within which, each node represents a posed image. An edge between two nodes suggests that the involved images are to share overlapped regions. Next, the initial scene graph is sanctified. Each node is then assigned a confidence score based on the number of matching points among neighboring nodes. Then, we train a Neural Radiance Field (NeRF) using the confidence-aware scene graph and images. The training process alternates between fitting the radiance field and updating the scene graph. Eventually, we can extract the 3D scene mesh from the trained field.
  • Figure 3: Visualization of matches that are falsely established as correspondences from non-overlapping regions. The results are obtained using COLMAP schoenberger2016sfm with SuperPoint detone2018superpoint and SuperGlue sarlin2020superglue. Note that most of the estimations are incorrect and can heavily affects the reconstruction quality.
  • Figure 4: Illustration of the two-view intersection-over-union (IoU) loss in 2D that can be easily extended into 3D. Given a pair of matched keypoints from source and reference images, in order to maximize the IoU between the two rays, both the camera pose of the source image and the estimated density in the radiance field have to be optimized.
  • Figure 5: Qualitative comparisons on the proposed dataset. As shown, our method is more robust to outlier poses, producing less distortion and better geometric detail. For the sake of space, we display the five top-performing results for each scene.
  • ...and 6 more figures