Event-based Simultaneous Localization and Mapping: A Comprehensive Survey
Kunping Huang, Sen Zhang, Jing Zhang, Dacheng Tao
TL;DR
This survey addresses the gap in comprehensive coverage of event-based vSLAM by organizing the literature into four main method classes: feature-based, direct, motion-compensation, and deep learning. It details event camera principles, representations, and the general vSLAM pipeline, then evaluates state-of-the-art methods across pose and depth tasks on standard benchmarks, highlighting strengths and limitations under high-speed and HDR conditions. The work emphasizes that while deep learning and multi-sensor fusion show strong promise, challenges remain in sparse data, noise modeling, and transferability across scenes. The paper concludes with actionable directions, including robust representations, better multisensor benchmarks, and the integration of foundation-models to advance practical, real-world event-based vSLAM systems.
Abstract
In recent decades, visual simultaneous localization and mapping (vSLAM) has gained significant interest in both academia and industry. It estimates camera motion and reconstructs the environment concurrently using visual sensors on a moving robot. However, conventional cameras are limited by hardware, including motion blur and low dynamic range, which can negatively impact performance in challenging scenarios like high-speed motion and high dynamic range illumination. Recent studies have demonstrated that event cameras, a new type of bio-inspired visual sensor, offer advantages such as high temporal resolution, dynamic range, low power consumption, and low latency. This paper presents a timely and comprehensive review of event-based vSLAM algorithms that exploit the benefits of asynchronous and irregular event streams for localization and mapping tasks. The review covers the working principle of event cameras and various event representations for preprocessing event data. It also categorizes event-based vSLAM methods into four main categories: feature-based, direct, motion-compensation, and deep learning methods, with detailed discussions and practical guidance for each approach. Furthermore, the paper evaluates the state-of-the-art methods on various benchmarks, highlighting current challenges and future opportunities in this emerging research area. A public repository will be maintained to keep track of the rapid developments in this field at {\url{https://github.com/kun150kun/ESLAM-survey}}.
