Table of Contents
Fetching ...

RePose-NeRF: Robust Radiance Fields for Mesh Reconstruction under Noisy Camera Poses

Sriram Srinivasan, Gautam Ramachandra

TL;DR

RePose-NeRF tackles robust 3D reconstruction from multi-view images with noisy poses by jointly refining camera extrinsics $\boldsymbol{\theta}_i \in \mathfrak{se}(3)$ and learning an implicit scene representation. It introduces a two-stage pipeline: Stage 1 learns a grid-based NeRF with pose refinement and a SDF-based geometry representation, while Stage 2 performs differentiable mesh refinement and texture baking to produce editable, view-consistent meshes compatible with standard graphics and robotics tools. The method leverages a coarse-to-fine hashing strategy, Eikonal and entropy regularizations, and an occupancy-grid sampling strategy to achieve fast convergence and robust reconstruction under pose uncertainty, outperforming BARF on LLFF and Blender NeRF-Synthetic datasets. The resulting textured meshes enable direct deployment in perception, manipulation, and simulation pipelines, bridging neural implicit representations with practical robotic applications.

Abstract

Accurate 3D reconstruction from multi-view images is essential for downstream robotic tasks such as navigation, manipulation, and environment understanding. However, obtaining precise camera poses in real-world settings remains challenging, even when calibration parameters are known. This limits the practicality of existing NeRF-based methods that rely heavily on accurate extrinsic estimates. Furthermore, their implicit volumetric representations differ significantly from the widely adopted polygonal meshes, making rendering and manipulation inefficient in standard 3D software. In this work, we propose a robust framework that reconstructs high-quality, editable 3D meshes directly from multi-view images with noisy extrinsic parameters. Our approach jointly refines camera poses while learning an implicit scene representation that captures fine geometric detail and photorealistic appearance. The resulting meshes are compatible with common 3D graphics and robotics tools, enabling efficient downstream use. Experiments on standard benchmarks demonstrate that our method achieves accurate and robust 3D reconstruction under pose uncertainty, bridging the gap between neural implicit representations and practical robotic applications.

RePose-NeRF: Robust Radiance Fields for Mesh Reconstruction under Noisy Camera Poses

TL;DR

RePose-NeRF tackles robust 3D reconstruction from multi-view images with noisy poses by jointly refining camera extrinsics and learning an implicit scene representation. It introduces a two-stage pipeline: Stage 1 learns a grid-based NeRF with pose refinement and a SDF-based geometry representation, while Stage 2 performs differentiable mesh refinement and texture baking to produce editable, view-consistent meshes compatible with standard graphics and robotics tools. The method leverages a coarse-to-fine hashing strategy, Eikonal and entropy regularizations, and an occupancy-grid sampling strategy to achieve fast convergence and robust reconstruction under pose uncertainty, outperforming BARF on LLFF and Blender NeRF-Synthetic datasets. The resulting textured meshes enable direct deployment in perception, manipulation, and simulation pipelines, bridging neural implicit representations with practical robotic applications.

Abstract

Accurate 3D reconstruction from multi-view images is essential for downstream robotic tasks such as navigation, manipulation, and environment understanding. However, obtaining precise camera poses in real-world settings remains challenging, even when calibration parameters are known. This limits the practicality of existing NeRF-based methods that rely heavily on accurate extrinsic estimates. Furthermore, their implicit volumetric representations differ significantly from the widely adopted polygonal meshes, making rendering and manipulation inefficient in standard 3D software. In this work, we propose a robust framework that reconstructs high-quality, editable 3D meshes directly from multi-view images with noisy extrinsic parameters. Our approach jointly refines camera poses while learning an implicit scene representation that captures fine geometric detail and photorealistic appearance. The resulting meshes are compatible with common 3D graphics and robotics tools, enabling efficient downstream use. Experiments on standard benchmarks demonstrate that our method achieves accurate and robust 3D reconstruction under pose uncertainty, bridging the gap between neural implicit representations and practical robotic applications.

Paper Structure

This paper contains 38 sections, 21 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Decomposition of the rendered image into diffuse and specular components. The final color image combines both components to produce the complete appearance.
  • Figure 2: Pose refinement results over optimization iterations. The initial estimates exhibit significant noise, which is progressively reduced as the optimization proceeds, leading to accurate and stable pose recovery after 30,000 iterations.
  • Figure 3: The coarse mesh contains over-dense regions and surface irregularities, while the refined mesh shows adaptive subdivision and decimation, producing a smoother and more efficient representation.
  • Figure 4: Comparison between coarse and fine mesh representations. The coarse mesh captures the overall geometry, while the fine mesh provides detailed surface information.
  • Figure 5: Qualitative visualization of rendering results for four NeRF-Synthetic classes (Ficus, Lego, Drums, Ship). Each column shows the GT, predicted RGB render, estimated depth, and depth error map.
  • ...and 2 more figures