Table of Contents
Fetching ...

Event-based Simultaneous Localization and Mapping: A Comprehensive Survey

Kunping Huang, Sen Zhang, Jing Zhang, Dacheng Tao

TL;DR

This survey addresses the gap in comprehensive coverage of event-based vSLAM by organizing the literature into four main method classes: feature-based, direct, motion-compensation, and deep learning. It details event camera principles, representations, and the general vSLAM pipeline, then evaluates state-of-the-art methods across pose and depth tasks on standard benchmarks, highlighting strengths and limitations under high-speed and HDR conditions. The work emphasizes that while deep learning and multi-sensor fusion show strong promise, challenges remain in sparse data, noise modeling, and transferability across scenes. The paper concludes with actionable directions, including robust representations, better multisensor benchmarks, and the integration of foundation-models to advance practical, real-world event-based vSLAM systems.

Abstract

In recent decades, visual simultaneous localization and mapping (vSLAM) has gained significant interest in both academia and industry. It estimates camera motion and reconstructs the environment concurrently using visual sensors on a moving robot. However, conventional cameras are limited by hardware, including motion blur and low dynamic range, which can negatively impact performance in challenging scenarios like high-speed motion and high dynamic range illumination. Recent studies have demonstrated that event cameras, a new type of bio-inspired visual sensor, offer advantages such as high temporal resolution, dynamic range, low power consumption, and low latency. This paper presents a timely and comprehensive review of event-based vSLAM algorithms that exploit the benefits of asynchronous and irregular event streams for localization and mapping tasks. The review covers the working principle of event cameras and various event representations for preprocessing event data. It also categorizes event-based vSLAM methods into four main categories: feature-based, direct, motion-compensation, and deep learning methods, with detailed discussions and practical guidance for each approach. Furthermore, the paper evaluates the state-of-the-art methods on various benchmarks, highlighting current challenges and future opportunities in this emerging research area. A public repository will be maintained to keep track of the rapid developments in this field at {\url{https://github.com/kun150kun/ESLAM-survey}}.

Event-based Simultaneous Localization and Mapping: A Comprehensive Survey

TL;DR

This survey addresses the gap in comprehensive coverage of event-based vSLAM by organizing the literature into four main method classes: feature-based, direct, motion-compensation, and deep learning. It details event camera principles, representations, and the general vSLAM pipeline, then evaluates state-of-the-art methods across pose and depth tasks on standard benchmarks, highlighting strengths and limitations under high-speed and HDR conditions. The work emphasizes that while deep learning and multi-sensor fusion show strong promise, challenges remain in sparse data, noise modeling, and transferability across scenes. The paper concludes with actionable directions, including robust representations, better multisensor benchmarks, and the integration of foundation-models to advance practical, real-world event-based vSLAM systems.

Abstract

In recent decades, visual simultaneous localization and mapping (vSLAM) has gained significant interest in both academia and industry. It estimates camera motion and reconstructs the environment concurrently using visual sensors on a moving robot. However, conventional cameras are limited by hardware, including motion blur and low dynamic range, which can negatively impact performance in challenging scenarios like high-speed motion and high dynamic range illumination. Recent studies have demonstrated that event cameras, a new type of bio-inspired visual sensor, offer advantages such as high temporal resolution, dynamic range, low power consumption, and low latency. This paper presents a timely and comprehensive review of event-based vSLAM algorithms that exploit the benefits of asynchronous and irregular event streams for localization and mapping tasks. The review covers the working principle of event cameras and various event representations for preprocessing event data. It also categorizes event-based vSLAM methods into four main categories: feature-based, direct, motion-compensation, and deep learning methods, with detailed discussions and practical guidance for each approach. Furthermore, the paper evaluates the state-of-the-art methods on various benchmarks, highlighting current challenges and future opportunities in this emerging research area. A public repository will be maintained to keep track of the rapid developments in this field at {\url{https://github.com/kun150kun/ESLAM-survey}}.
Paper Structure (47 sections, 2 equations, 7 figures, 7 tables)

This paper contains 47 sections, 2 equations, 7 figures, 7 tables.

Figures (7)

  • Figure 1: Figure (a) and (b) are raw event data in the spatio-temporal space and projected on the image plane, respectively with pseduo-colored red-blue points, according to event polarity. Figure (c), adapted from 9879881, shows the camera trajectory and 3D depth map.
  • Figure 2: The structure of the event-based vSLAM algorithms and the taxonomy of the existing works.
  • Figure 3: The flow diagram describes the process of the feature-based visual odometry (VO) algorithms with pure event data.
  • Figure 4: The diagrams depict event-based direct methods. Direct methods attempt align events data to the corresponding events or pixels in image to estimate camera poses and 3D maps.
  • Figure 5: Computation of the contrast threshold by reprojecting events from the event camera to a reference image. Figure adapted from 8094962.
  • ...and 2 more figures