Table of Contents
Fetching ...

Ternary-Type Opacity and Hybrid Odometry for RGB NeRF-SLAM

Junru Lin, Asen Nachkov, Songyou Peng, Luc Van Gool, Danda Pani Paudel

TL;DR

This work tackles RGB-only NeRF-SLAM by introducing a ternary-type opacity (TT) prior and a hybrid odometry (HO) pipeline. TT concentrates ray weights near surface depth via a softly-binarized decoder, enabling more accurate depth rendering and faster map convergence, while HO combines gradient-based warping for coarse pose initialization with bundle adjustment for refinement to boost speed and robustness. The approach yields state-of-the-art results on Replica and 7-Scenes in both tracking and mapping metrics, with reported speedups of about 6x over baselines like DIM-SLAM and substantial robustness to reduced BA iterations. Together, TT and HO provide a practical path toward efficient RGB-only NeRF-SLAM that leverages real-world surface priors to improve fidelity and speed.

Abstract

In this work, we address the challenge of deploying Neural Radiance Field (NeRFs) in Simultaneous Localization and Mapping (SLAM) under the condition of lacking depth information, relying solely on RGB inputs. The key to unlocking the full potential of NeRF in such a challenging context lies in the integration of real-world priors. A crucial prior we explore is the binary opacity prior of 3D space with opaque objects. To effectively incorporate this prior into the NeRF framework, we introduce a ternary-type opacity (TT) model instead, which categorizes points on a ray intersecting a surface into three regions: before, on, and behind the surface. This enables a more accurate rendering of depth, subsequently improving the performance of image warping techniques. Therefore, we further propose a novel hybrid odometry (HO) scheme that merges bundle adjustment and warping-based localization. Our integrated approach of TT and HO achieves state-of-the-art performance on synthetic and real-world datasets, in terms of both speed and accuracy. This breakthrough underscores the potential of NeRF-SLAM in navigating complex environments with high fidelity.

Ternary-Type Opacity and Hybrid Odometry for RGB NeRF-SLAM

TL;DR

This work tackles RGB-only NeRF-SLAM by introducing a ternary-type opacity (TT) prior and a hybrid odometry (HO) pipeline. TT concentrates ray weights near surface depth via a softly-binarized decoder, enabling more accurate depth rendering and faster map convergence, while HO combines gradient-based warping for coarse pose initialization with bundle adjustment for refinement to boost speed and robustness. The approach yields state-of-the-art results on Replica and 7-Scenes in both tracking and mapping metrics, with reported speedups of about 6x over baselines like DIM-SLAM and substantial robustness to reduced BA iterations. Together, TT and HO provide a practical path toward efficient RGB-only NeRF-SLAM that leverages real-world surface priors to improve fidelity and speed.

Abstract

In this work, we address the challenge of deploying Neural Radiance Field (NeRFs) in Simultaneous Localization and Mapping (SLAM) under the condition of lacking depth information, relying solely on RGB inputs. The key to unlocking the full potential of NeRF in such a challenging context lies in the integration of real-world priors. A crucial prior we explore is the binary opacity prior of 3D space with opaque objects. To effectively incorporate this prior into the NeRF framework, we introduce a ternary-type opacity (TT) model instead, which categorizes points on a ray intersecting a surface into three regions: before, on, and behind the surface. This enables a more accurate rendering of depth, subsequently improving the performance of image warping techniques. Therefore, we further propose a novel hybrid odometry (HO) scheme that merges bundle adjustment and warping-based localization. Our integrated approach of TT and HO achieves state-of-the-art performance on synthetic and real-world datasets, in terms of both speed and accuracy. This breakthrough underscores the potential of NeRF-SLAM in navigating complex environments with high fidelity.
Paper Structure (15 sections, 6 theorems, 14 equations, 6 figures, 3 tables)

This paper contains 15 sections, 6 theorems, 14 equations, 6 figures, 3 tables.

Key Result

Lemma IV.2

The desired weight constraints $w_i\in\{0,1\}$ for the totally ordered set $\mathcal{S}_l$ can be achieved if and only if at least one of the following statements is true,

Figures (6)

  • Figure 1: Qualitative and Quantitative Results. On the left, we show the rendered RGB and depth from a random pose after training the whole sequence of Replica straub2019replicaroom-1. On the right, we show the speed, tracking error, and mapping error on Replica office-0. DIM-SLAM$^*$ refers to our re-implementation for DIM-SLAM li2023dense.
  • Figure 2: Inferring the Opacity of a 3D Point. We utilize a set of multi-resolution feature grids which we interpolate at the desired 3D point. The collected features are passed to a neural network to predict the color and the opacity. Only the opacity is shown for clarity.
  • Figure 3: Opacity and Weights along a Ray. With the ternary-type opacity (TT), the weights along a randomly sampled ray are more concentrated near the depth with a higher peak. The dots on the curves represent the sampled points on the ray. Data obtained from Replica straub2019replicaoffice-0 at the end of the training.
  • Figure 4: Illustration of Ternary Opacity. Randomly sampled 3D points have their features extracted through interpolation across multi-resolution feature grids. These features are input to a neural network to predict color and opacity. The resulting opacity histograms, shown above, demonstrate our method's effectiveness in generating the desired ternary-type opacity.
  • Figure 5: Hybrid Odometry. For simplicity $t$ refers to the index of the first frame from the current frame group $\mathcal{G}_m$ and $\mathcal{T}_1$ is the first frame from the tracking frames. In gradient-based localization by warping, we initialize poses by constant velocity, render the depth for some pixels on the last few frames $\mathcal{T}$ from the previous frame group to get point cloud $\mathcal{X}$, and update the camera poses by minimizing the reprojection loss of $\mathcal{X}$ to the current frame $\mathcal{G}_m$. In bundle adjustment we select several keyframes and together with $\mathcal{G}_m$ use VR to produce pixel values. The loss is backpropagated to the feature grids and the camera poses belonging to frames in $\mathcal{G}_m$.
  • ...and 1 more figures

Theorems & Definitions (12)

  • Lemma IV.2
  • proof
  • Lemma IV.3
  • proof
  • Lemma IV.4
  • proof
  • Theorem IV.5: Relevant Binary-type Opacity
  • proof
  • Proposition IV.6
  • proof
  • ...and 2 more