Table of Contents
Fetching ...

PoRF: Pose Residual Field for Accurate Neural Surface Reconstruction

Jia-Wang Bian, Wenjing Bian, Victor Adrian Prisacariu, Philip Torr

TL;DR

The pose residual field (PoRF) is introduced, a novel implicit representation that uses an MLP for regressing pose updates that is more robust than the conventional pose parameter optimisation due to parameter sharing that leverages global information over the entire sequence.

Abstract

Neural surface reconstruction is sensitive to the camera pose noise, even if state-of-the-art pose estimators like COLMAP or ARKit are used. More importantly, existing Pose-NeRF joint optimisation methods have struggled to improve pose accuracy in challenging real-world scenarios. To overcome the challenges, we introduce the pose residual field (PoRF), a novel implicit representation that uses an MLP for regressing pose updates. This is more robust than the conventional pose parameter optimisation due to parameter sharing that leverages global information over the entire sequence. Furthermore, we propose an epipolar geometry loss to enhance the supervision that leverages the correspondences exported from COLMAP results without the extra computational overhead. Our method yields promising results. On the DTU dataset, we reduce the rotation error by 78\% for COLMAP poses, leading to the decreased reconstruction Chamfer distance from 3.48mm to 0.85mm. On the MobileBrick dataset that contains casually captured unbounded 360-degree videos, our method refines ARKit poses and improves the reconstruction F1 score from 69.18 to 75.67, outperforming that with the dataset provided ground-truth pose (75.14). These achievements demonstrate the efficacy of our approach in refining camera poses and improving the accuracy of neural surface reconstruction in real-world scenarios.

PoRF: Pose Residual Field for Accurate Neural Surface Reconstruction

TL;DR

The pose residual field (PoRF) is introduced, a novel implicit representation that uses an MLP for regressing pose updates that is more robust than the conventional pose parameter optimisation due to parameter sharing that leverages global information over the entire sequence.

Abstract

Neural surface reconstruction is sensitive to the camera pose noise, even if state-of-the-art pose estimators like COLMAP or ARKit are used. More importantly, existing Pose-NeRF joint optimisation methods have struggled to improve pose accuracy in challenging real-world scenarios. To overcome the challenges, we introduce the pose residual field (PoRF), a novel implicit representation that uses an MLP for regressing pose updates. This is more robust than the conventional pose parameter optimisation due to parameter sharing that leverages global information over the entire sequence. Furthermore, we propose an epipolar geometry loss to enhance the supervision that leverages the correspondences exported from COLMAP results without the extra computational overhead. Our method yields promising results. On the DTU dataset, we reduce the rotation error by 78\% for COLMAP poses, leading to the decreased reconstruction Chamfer distance from 3.48mm to 0.85mm. On the MobileBrick dataset that contains casually captured unbounded 360-degree videos, our method refines ARKit poses and improves the reconstruction F1 score from 69.18 to 75.67, outperforming that with the dataset provided ground-truth pose (75.14). These achievements demonstrate the efficacy of our approach in refining camera poses and improving the accuracy of neural surface reconstruction in real-world scenarios.
Paper Structure (47 sections, 10 equations, 11 figures, 8 tables)

This paper contains 47 sections, 10 equations, 11 figures, 8 tables.

Figures (11)

  • Figure 1: Reconstruction results on the DTU dataset (scan24). All meshes were generated by using Voxurf wu2022voxurf. The Chamfer Distance (mm) is reported. BARF lin2021barf, SPARF truong2023sparf, and our method all take the COLMAP pose as the initial pose. More results are illustrated in the supplementary material.
  • Figure 2: Joint optimisation pipeline. The proposed model consists of a pose residual field $F_\theta$ and a neural surface reconstruction module $G_\phi$. PoRF takes the frame index and the initial camera pose as input and employs an MLP to learn the pose residual, which is composited with the initial pose to obtain the predicted pose. The output pose is used to compute the neural rendering losses with the NSR module and the epipolar geometry loss with pre-computed 2D correspondences. Parameters $\theta$ and $\phi$ are updated during back-propagation.
  • Figure 3: Pose errors during training on the DTU dataset. The results are averaged over 15 test scenes. Baseline (B) denotes the naive joint optimisation of NSR and pose parameters.
  • Figure 4: Reconstruction results on DTU (top) and MobileBrick (bottom) datasets. The initial pose denotes the COLMAP pose on DTU, and the ARKit pose on MobileBrick. We use the standard evaluation metrics, i.e., Chamfer distance (mm) for DTU and F1 score for MobileBrick.
  • Figure 5: Qualitative reconstruction results on the DTU dataset. All meshes were generated by using Voxurf wu2022voxurf. All refinement method takes the COLMAP schoenberger2016sfm pose as the initial pose.
  • ...and 6 more figures