Table of Contents
Fetching ...

DPPE: Dense Pose Estimation in a Plenoxels Environment using Gradient Approximation

Christopher Kolios, Yeganeh Bahoo, Sajad Saeedi

TL;DR

This work introduces a 6-DoF monocular RGB-only pose estimation procedure for Plenoxels, which seeks to recover the ground truth camera pose after a perturbation, and employs a variation on classical template matching techniques, using stochastic gradient descent to optimize the pose by minimizing errors in re-rendering.

Abstract

We present DPPE, a dense pose estimation algorithm that functions over a Plenoxels environment. Recent advances in neural radiance field techniques have shown that it is a powerful tool for environment representation. More recent neural rendering algorithms have significantly improved both training duration and rendering speed. Plenoxels introduced a fully-differentiable radiance field technique that uses Plenoptic volume elements contained in voxels for rendering, offering reduced training times and better rendering accuracy, while also eliminating the neural net component. In this work, we introduce a 6-DoF monocular RGB-only pose estimation procedure for Plenoxels, which seeks to recover the ground truth camera pose after a perturbation. We employ a variation on classical template matching techniques, using stochastic gradient descent to optimize the pose by minimizing errors in re-rendering. In particular, we examine an approach that takes advantage of the rapid rendering speed of Plenoxels to numerically approximate part of the pose gradient, using a central differencing technique. We show that such methods are effective in pose estimation. Finally, we perform ablations over key components of the problem space, with a particular focus on image subsampling and Plenoxel grid resolution. Project website: https://sites.google.com/view/dppe

DPPE: Dense Pose Estimation in a Plenoxels Environment using Gradient Approximation

TL;DR

This work introduces a 6-DoF monocular RGB-only pose estimation procedure for Plenoxels, which seeks to recover the ground truth camera pose after a perturbation, and employs a variation on classical template matching techniques, using stochastic gradient descent to optimize the pose by minimizing errors in re-rendering.

Abstract

We present DPPE, a dense pose estimation algorithm that functions over a Plenoxels environment. Recent advances in neural radiance field techniques have shown that it is a powerful tool for environment representation. More recent neural rendering algorithms have significantly improved both training duration and rendering speed. Plenoxels introduced a fully-differentiable radiance field technique that uses Plenoptic volume elements contained in voxels for rendering, offering reduced training times and better rendering accuracy, while also eliminating the neural net component. In this work, we introduce a 6-DoF monocular RGB-only pose estimation procedure for Plenoxels, which seeks to recover the ground truth camera pose after a perturbation. We employ a variation on classical template matching techniques, using stochastic gradient descent to optimize the pose by minimizing errors in re-rendering. In particular, we examine an approach that takes advantage of the rapid rendering speed of Plenoxels to numerically approximate part of the pose gradient, using a central differencing technique. We show that such methods are effective in pose estimation. Finally, we perform ablations over key components of the problem space, with a particular focus on image subsampling and Plenoxel grid resolution. Project website: https://sites.google.com/view/dppe
Paper Structure (15 sections, 9 equations, 4 figures, 3 tables, 1 algorithm)

This paper contains 15 sections, 9 equations, 4 figures, 3 tables, 1 algorithm.

Figures (4)

  • Figure 1: Qualitative performance of DPPE on a pose from the Lego scene, rendered from the current pose estimate (opaque) with the ground-truth image (transparent). DPPE starts with the initial perturbed pose (top left), then as it runs the pose estimate gets closer to the ground truth (left to right, top to bottom).
  • Figure 2: An illustration of the analysis pipeline. A star indicates a significant contribution of this work towards the module. First, a Plenoxels grid is trained for a scene. Then, a ground-truth pose is perturbed, with the extent of perturbation depending on the test being run. The trained Plenoxels grid, ground-truth pose, and perturbed pose are passed into the pose estimation process, which outputs the final pose after optimization. The scene can be rendered from any pose, to visualize camera position and orientation.
  • Figure 3: Rotation (top) and translation (bottom) pose error for DPPE. From left to right, pose error is presented as: average, median, and % failures, as a function of epoch. DPPE quickly converges towards the success criteria before tapering off, whereas iNeRF starts by getting worse, before rapidly converging to a very low error.
  • Figure 4: DPPE can be susceptible to local minima. (Left) The scene rendered from the pose initialization (opaque) with the ground-truth (transparent). (Right) The render of the scene from the final pose estimate, after DPPE. As DPPE only considers the local gradient approximation, due to the local minimum the pose is misaligned by an entire row.