Table of Contents
Fetching ...

Continuous Pose for Monocular Cameras in Neural Implicit Representation

Qi Ma, Danda Pani Paudel, Ajad Chhatkuli, Luc Van Gool

TL;DR

Using the assumption of continuous motion, changes in pose may actually live in a manifold that has lower than 6 degrees of freedom (DOF) is realized and the approach is used in vSLAM settings, showing impressive camera tracking performance.

Abstract

In this paper, we showcase the effectiveness of optimizing monocular camera poses as a continuous function of time. The camera poses are represented using an implicit neural function which maps the given time to the corresponding camera pose. The mapped camera poses are then used for the downstream tasks where joint camera pose optimization is also required. While doing so, the network parameters -- that implicitly represent camera poses -- are optimized. We exploit the proposed method in four diverse experimental settings, namely, (1) NeRF from noisy poses; (2) NeRF from asynchronous Events; (3) Visual Simultaneous Localization and Mapping (vSLAM); and (4) vSLAM with IMUs. In all four settings, the proposed method performs significantly better than the compared baselines and the state-of-the-art methods. Additionally, using the assumption of continuous motion, changes in pose may actually live in a manifold that has lower than 6 degrees of freedom (DOF) is also realized. We call this low DOF motion representation as the \emph{intrinsic motion} and use the approach in vSLAM settings, showing impressive camera tracking performance.

Continuous Pose for Monocular Cameras in Neural Implicit Representation

TL;DR

Using the assumption of continuous motion, changes in pose may actually live in a manifold that has lower than 6 degrees of freedom (DOF) is realized and the approach is used in vSLAM settings, showing impressive camera tracking performance.

Abstract

In this paper, we showcase the effectiveness of optimizing monocular camera poses as a continuous function of time. The camera poses are represented using an implicit neural function which maps the given time to the corresponding camera pose. The mapped camera poses are then used for the downstream tasks where joint camera pose optimization is also required. While doing so, the network parameters -- that implicitly represent camera poses -- are optimized. We exploit the proposed method in four diverse experimental settings, namely, (1) NeRF from noisy poses; (2) NeRF from asynchronous Events; (3) Visual Simultaneous Localization and Mapping (vSLAM); and (4) vSLAM with IMUs. In all four settings, the proposed method performs significantly better than the compared baselines and the state-of-the-art methods. Additionally, using the assumption of continuous motion, changes in pose may actually live in a manifold that has lower than 6 degrees of freedom (DOF) is also realized. We call this low DOF motion representation as the \emph{intrinsic motion} and use the approach in vSLAM settings, showing impressive camera tracking performance.
Paper Structure (40 sections, 8 equations, 20 figures, 16 tables)

This paper contains 40 sections, 8 equations, 20 figures, 16 tables.

Figures (20)

  • Figure 1: We showcase the benefits of optimizing the poses as a continuous function of time in diverse settings. We conduct exhaustive experiments on (a) rectifying inaccurate poses in RGB-only settings; (b) utilizing the asynchronous stream of events, (c) performing vSLAM in RGB-D camera settings; (d) integrating high-frequency IMUs in vSLAM. All experiments use neural functions for both camera poses and scene representations. Additionaly we exploit low dof motion representation in intrinsic motion frame $T_I$.
  • Figure 2: Patch Reconstruction Color-coded patch correspond to Fig \ref{['fig:planer_qualtiative']}.Note that patch 2D rigid motion exhibits continuity over time (left to right)
  • Figure 3: Qualitative results of 2D planar Alignment. We report the qualitative results of planar image alignment. Given input as ground truth (d) shown in Fig. \ref{['fig:patch_vis']}, the goal is to find the 2D rigid transformation for each patch and optimize the entire neural image. Our method optimizes for accurate alignment and high-fidelity image reconstruction, while baselines fail due to local minima.
  • Figure 4: We introduce continuous errors on the camera trajectories and perform pose refinement in the NeRF setting. (a) Initial pose error; (b) results obtained using BARF lin2021barf that uses a discrete set of poses; (c) results obtained using our continuous pose representation.
  • Figure 5: With and without calibration experiments. We investigate the effectiveness of our method under different deviations from the actual rotational axis. Our method can successfully reposition the object back to the center without additional calibration.
  • ...and 15 more figures