Table of Contents
Fetching ...

Benchmarking Implicit Neural Representation and Geometric Rendering in Real-Time RGB-D SLAM

Tongyan Hua, Lin Wang

TL;DR

This work establishes the first open-source benchmark framework to evaluate the performance of a wide spectrum of commonly used INRs and rendering functions for mapping and localization and proposes explicit hybrid encoding for high-fidelity dense grid mapping to comply with the RGB-D SLAM system that puts the premise on robustness and computation efficiency.

Abstract

Implicit neural representation (INR), in combination with geometric rendering, has recently been employed in real-time dense RGB-D SLAM. Despite active research endeavors being made, there lacks a unified protocol for fair evaluation, impeding the evolution of this area. In this work, we establish, to our knowledge, the first open-source benchmark framework to evaluate the performance of a wide spectrum of commonly used INRs and rendering functions for mapping and localization. The goal of our benchmark is to 1) gain an intuition of how different INRs and rendering functions impact mapping and localization and 2) establish a unified evaluation protocol w.r.t. the design choices that may impact the mapping and localization. With the framework, we conduct a large suite of experiments, offering various insights in choosing the INRs and geometric rendering functions: for example, the dense feature grid outperforms other INRs (e.g. tri-plane and hash grid), even when geometric and color features are jointly encoded for memory efficiency. To extend the findings into the practical scenario, a hybrid encoding strategy is proposed to bring the best of the accuracy and completion from the grid-based and decomposition-based INRs. We further propose explicit hybrid encoding for high-fidelity dense grid mapping to comply with the RGB-D SLAM system that puts the premise on robustness and computation efficiency.

Benchmarking Implicit Neural Representation and Geometric Rendering in Real-Time RGB-D SLAM

TL;DR

This work establishes the first open-source benchmark framework to evaluate the performance of a wide spectrum of commonly used INRs and rendering functions for mapping and localization and proposes explicit hybrid encoding for high-fidelity dense grid mapping to comply with the RGB-D SLAM system that puts the premise on robustness and computation efficiency.

Abstract

Implicit neural representation (INR), in combination with geometric rendering, has recently been employed in real-time dense RGB-D SLAM. Despite active research endeavors being made, there lacks a unified protocol for fair evaluation, impeding the evolution of this area. In this work, we establish, to our knowledge, the first open-source benchmark framework to evaluate the performance of a wide spectrum of commonly used INRs and rendering functions for mapping and localization. The goal of our benchmark is to 1) gain an intuition of how different INRs and rendering functions impact mapping and localization and 2) establish a unified evaluation protocol w.r.t. the design choices that may impact the mapping and localization. With the framework, we conduct a large suite of experiments, offering various insights in choosing the INRs and geometric rendering functions: for example, the dense feature grid outperforms other INRs (e.g. tri-plane and hash grid), even when geometric and color features are jointly encoded for memory efficiency. To extend the findings into the practical scenario, a hybrid encoding strategy is proposed to bring the best of the accuracy and completion from the grid-based and decomposition-based INRs. We further propose explicit hybrid encoding for high-fidelity dense grid mapping to comply with the RGB-D SLAM system that puts the premise on robustness and computation efficiency.
Paper Structure (16 sections, 12 equations, 11 figures, 10 tables)

This paper contains 16 sections, 12 equations, 11 figures, 10 tables.

Figures (11)

  • Figure 1: (a) We establish a novel benchmark to evaluate different elements of NeRF, narrowly defined as a combination of INR function $\mathcal{F}$ and geometric rendering function $\mathcal{G}$, under the unified RGB-D SLAM paradigm. (b) Rendering Loss guides the online updating of the pose from $\tilde{T}$ to $\hat{T}$, and parameter of $\mathcal{F}$. (c) A toy example illustrates the impact of various combinations of $\mathcal{F}$ and $\mathcal{G}$: $\mathcal{F}_2$ surpasses $\mathcal{F}_1$ in trajectory estimation and reconstruction fidelity but compromising completeness, inspire new designs that bring the benefits of $\mathcal{F}_1$ and $\mathcal{F}_2$ to form a hybrid encoding $\mathcal{F}_{1+2}$.
  • Figure 2: The proposed pipeline for NeRF-SLAM benchmark. The asterisk * indicates the existing two values for evaluation.
  • Figure 3: Illustration of new designs. For hybrid encoding, a point $p_i$ is (a) encoded using feature planes and a feature grid at a coarse level, and exclusively by a feature grid at a fine level. In contrast, for explicit hybrid encoding, $p_i$ is (b) solely encoded with an optimizable fine-level feature grid and decoded by MLP into an SDF residual $s^r_i$ and color $c_i$. This residual is then combined with the SDF prior stored in an explicit octree $s^{oc}_i$ to derive the inferred SDF value $s_i$.
  • Figure 4: Reconstruction of 'morning apartment' sequence on the NeuralRGBD dataset, Our hybrid encoding strategy brings the best of two worlds.
  • Figure 5: Qualitative evaluation of explicit hybrid encoding on 'scene0000' sequence of ScanNet Dataset. Both NICE-SLAM and Ours run on the posed RGB-D stream to simulate an externally provided tracker.
  • ...and 6 more figures