Table of Contents
Fetching ...

Robust SG-NeRF: Robust Scene Graph Aided Neural Surface Reconstruction

Yi Gu, Dongjun Ye, Zhaorui Wang, Jiaxu Wang, Jiahang Cao, Renjing Xu

TL;DR

This work employs a detached color network that omits the viewing direction as input to minimize the impact caused by shape-radiance ambiguities, and integrates an inlier-outlier confidence estimation scheme, leveraging scene graph information gathered during the data preparation phase.

Abstract

Neural surface reconstruction relies heavily on accurate camera poses as input. Despite utilizing advanced pose estimators like COLMAP or ARKit, camera poses can still be noisy. Existing pose-NeRF joint optimization methods handle poses with small noise (inliers) effectively but struggle with large noise (outliers), such as mirrored poses. In this work, we focus on mitigating the impact of outlier poses. Our method integrates an inlier-outlier confidence estimation scheme, leveraging scene graph information gathered during the data preparation phase. Unlike previous works directly using rendering metrics as the reference, we employ a detached color network that omits the viewing direction as input to minimize the impact caused by shape-radiance ambiguities. This enhanced confidence updating strategy effectively differentiates between inlier and outlier poses, allowing us to sample more rays from inlier poses to construct more reliable radiance fields. Additionally, we introduce a re-projection loss based on the current Signed Distance Function (SDF) and pose estimations, strengthening the constraints between matching image pairs. For outlier poses, we adopt a Monte Carlo re-localization method to find better solutions. We also devise a scene graph updating strategy to provide more accurate information throughout the training process. We validate our approach on the SG-NeRF and DTU datasets. Experimental results on various datasets demonstrate that our methods can consistently improve the reconstruction qualities and pose accuracies.

Robust SG-NeRF: Robust Scene Graph Aided Neural Surface Reconstruction

TL;DR

This work employs a detached color network that omits the viewing direction as input to minimize the impact caused by shape-radiance ambiguities, and integrates an inlier-outlier confidence estimation scheme, leveraging scene graph information gathered during the data preparation phase.

Abstract

Neural surface reconstruction relies heavily on accurate camera poses as input. Despite utilizing advanced pose estimators like COLMAP or ARKit, camera poses can still be noisy. Existing pose-NeRF joint optimization methods handle poses with small noise (inliers) effectively but struggle with large noise (outliers), such as mirrored poses. In this work, we focus on mitigating the impact of outlier poses. Our method integrates an inlier-outlier confidence estimation scheme, leveraging scene graph information gathered during the data preparation phase. Unlike previous works directly using rendering metrics as the reference, we employ a detached color network that omits the viewing direction as input to minimize the impact caused by shape-radiance ambiguities. This enhanced confidence updating strategy effectively differentiates between inlier and outlier poses, allowing us to sample more rays from inlier poses to construct more reliable radiance fields. Additionally, we introduce a re-projection loss based on the current Signed Distance Function (SDF) and pose estimations, strengthening the constraints between matching image pairs. For outlier poses, we adopt a Monte Carlo re-localization method to find better solutions. We also devise a scene graph updating strategy to provide more accurate information throughout the training process. We validate our approach on the SG-NeRF and DTU datasets. Experimental results on various datasets demonstrate that our methods can consistently improve the reconstruction qualities and pose accuracies.

Paper Structure

This paper contains 22 sections, 13 equations, 14 figures, 3 tables.

Figures (14)

  • Figure 1: Reconstruction results on the SG-NeRF chen2024sg dataset. Both SG-NeRF chen2024sg and our method take the same initial poses as input, including significant noises. The camera poses are also presented with optimized outlier poses, inlier poses and ground truth poses. More results are illustrated in the supplementary material.
  • Figure 2: The illustration of the pose ambiguity. The first row is the results from inliers and the second row presents outliers. Images in the first column come from COLMAP schoenberger2016sfm GUI, which show that both these two poses are registered in front of the object. However, the ground truth images in the second column show the opposite phenomenon. The third column presents the rendering results of SG-NeRF chen2024sg, which use the same color network as NeuS wang2021neus with view direction as input. As shown in the fourth column, our method incorporates an isolated color network, which can well recognize this ambiguity.
  • Figure 3: An overview of the proposed pipeline. Given the initial scene graph, we apply a confidence updating strategy based on an indicator from a detached color network, which can identify inlier and outlier poses. For inliers, we utilize re-projection loss and IoU loss to enhance the geometric constraints. For outliers, we utilize Monte Carlo re-localization method to find better initializations. The scene graph is also updated based on current geometry and pose estimations. Eventually, our method can reconstruct the 3D mesh from the trained field and rectify both inlier and outlier poses with high accuracy. The coloration is same as Fig. \ref{['fig:teaser']}.
  • Figure 4: The illustration of scene graph updating. We filter out the wrong matched keypoint pairs (colored in red lines) and keep the correct pairs (colored in blue lines). We select image pairs with relatively few matching pairs for clearer visualization. Additional results are detailed in the supplementary materials.
  • Figure 5: Qualitative comparisons on the SG-NeRF chen2024sg dataset. Our method can generally recover high-fidelity geometry with only one-stage training. More visual comparisons are provided in supplementary materials.
  • ...and 9 more figures