Table of Contents
Fetching ...

EvenNICER-SLAM: Event-based Neural Implicit Encoding SLAM

Shi Chen, Danda Pani Paudel, Luc Van Gool

TL;DR

EvenNICER-SLAM, with an inclusion of higher-frequency event image input, significantly outperforms NICE-SLAM with reduced RGB-D input frequency and suggests the potential for event cameras to improve the robustness of dense SLAM systems against fast camera motion in real-world scenarios.

Abstract

The advancement of dense visual simultaneous localization and mapping (SLAM) has been greatly facilitated by the emergence of neural implicit representations. Neural implicit encoding SLAM, a typical example of which is NICE-SLAM, has recently demonstrated promising results in large-scale indoor scenes. However, these methods typically rely on temporally dense RGB-D image streams as input in order to function properly. When the input source does not support high frame rates or the camera movement is too fast, these methods often experience crashes or significant degradation in tracking and mapping accuracy. In this paper, we propose EvenNICER-SLAM, a novel approach that addresses this issue through the incorporation of event cameras. Event cameras are bio-inspired cameras that respond to intensity changes instead of absolute brightness. Specifically, we integrated an event loss backpropagation stream into the NICE-SLAM pipeline to enhance camera tracking with insufficient RGB-D input. We found through quantitative evaluation that EvenNICER-SLAM, with an inclusion of higher-frequency event image input, significantly outperforms NICE-SLAM with reduced RGB-D input frequency. Our results suggest the potential for event cameras to improve the robustness of dense SLAM systems against fast camera motion in real-world scenarios.

EvenNICER-SLAM: Event-based Neural Implicit Encoding SLAM

TL;DR

EvenNICER-SLAM, with an inclusion of higher-frequency event image input, significantly outperforms NICE-SLAM with reduced RGB-D input frequency and suggests the potential for event cameras to improve the robustness of dense SLAM systems against fast camera motion in real-world scenarios.

Abstract

The advancement of dense visual simultaneous localization and mapping (SLAM) has been greatly facilitated by the emergence of neural implicit representations. Neural implicit encoding SLAM, a typical example of which is NICE-SLAM, has recently demonstrated promising results in large-scale indoor scenes. However, these methods typically rely on temporally dense RGB-D image streams as input in order to function properly. When the input source does not support high frame rates or the camera movement is too fast, these methods often experience crashes or significant degradation in tracking and mapping accuracy. In this paper, we propose EvenNICER-SLAM, a novel approach that addresses this issue through the incorporation of event cameras. Event cameras are bio-inspired cameras that respond to intensity changes instead of absolute brightness. Specifically, we integrated an event loss backpropagation stream into the NICE-SLAM pipeline to enhance camera tracking with insufficient RGB-D input. We found through quantitative evaluation that EvenNICER-SLAM, with an inclusion of higher-frequency event image input, significantly outperforms NICE-SLAM with reduced RGB-D input frequency. Our results suggest the potential for event cameras to improve the robustness of dense SLAM systems against fast camera motion in real-world scenarios.
Paper Structure (17 sections, 3 equations, 8 figures, 2 tables)

This paper contains 17 sections, 3 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: A brief illustration of EvenNICER-SLAM. Our method takes advantage of the high-temporal-resolution feature of event data to facilitate camera tracking in high-speed application scenarios. EvenNICER-SLAM typically takes lower-frequency RGB-D input and higher-frequency event image input and significantly outperforms its predecessor, NICE-SLAMzhu2022nice, in both camera tracking and mapping.
  • Figure 2: An overview of EvenNICER-SLAM framework. Before running EvenNICER-SLAM, we first generate ground truth event images between all adjacent frames in the original dataset with ESIM rebecq2018esim. During a run, we feed Event-Net with the currently rendered RGB image and the latest available ground truth RGB image prior to the current timestamp, in order to predict an event image between them. Accordingly, we sum up ground truth event images at corresponding timestamps and compute an event loss. Finally, the event loss is backpropagated through the differentiable Event-Net and renderer to optimize camera poses. The components within the green box are kept the same as in NICE-SLAM zhu2022nice.
  • Figure 3: A simplified demonstration of event loss alignment issue. A red/green pixel represents a positive/negative event. In terms of prediction error, intuitively we would consider $A < B < C$. However, if we implement the event loss as a simple pixelwise error, the result will be a counterintuitive $A < C < B$.
  • Figure 4: The timeline of one tracking cycle of EvenNICER-SLAM with frame gap $\tau = 3$. The preprocessing part is illustrated above the time axis, and below the axis are the actual processes executed by EvenNICER-SLAM during a run.
  • Figure 5: Comparison of camera trajectories estimated by NICE-SLAMzhu2022nice and EvenNICER-SLAM (room2 from Replicastraub2019replica). The frame gap of RGB-D images is set to $\tau = 5$. With the extra event supervision, EvenNICER-SLAM tracks the camera motion with a higher accuracy.
  • ...and 3 more figures