Table of Contents
Fetching ...

ES-PTAM: Event-based Stereo Parallel Tracking and Mapping

Suman Ghosh, Valentina Cavinato, Guillermo Gallego

TL;DR

This work tackles robust visual odometry and SLAM under challenging conditions using event-based stereo cameras. It introduces ES-PTAM, a parallel-tracking-and-mapping system that couples an improved ray-density fusion mapper with an edge-map based tracker, operating directly on event streams. The approach scales to multi-camera setups and is validated on five real-world datasets, including a trinocular EVIMO2 sequence, where it frequently outperforms state-of-the-art ESVO and EVO in pose accuracy and yields sharper semi-dense maps. The authors provide extensive qualitative and quantitative results and release the open-source implementation to foster community advancement in event-based perception. Overall, this work advances purely event-based VO/SLAM, with potential impact for autonomous vehicles and mobile robots operating in HDR and high-speed environments.

Abstract

Visual Odometry (VO) and SLAM are fundamental components for spatial perception in mobile robots. Despite enormous progress in the field, current VO/SLAM systems are limited by their sensors' capability. Event cameras are novel visual sensors that offer advantages to overcome the limitations of standard cameras, enabling robots to expand their operating range to challenging scenarios, such as high-speed motion and high dynamic range illumination. We propose a novel event-based stereo VO system by combining two ideas: a correspondence-free mapping module that estimates depth by maximizing ray density fusion and a tracking module that estimates camera poses by maximizing edge-map alignment. We evaluate the system comprehensively on five real-world datasets, spanning a variety of camera types (manufacturers and spatial resolutions) and scenarios (driving, flying drone, hand-held, egocentric, etc). The quantitative and qualitative results demonstrate that our method outperforms the state of the art in majority of the test sequences by a margin, e.g., trajectory error reduction of 45% on RPG dataset, 61% on DSEC dataset, and 21% on TUM-VIE dataset. To benefit the community and foster research on event-based perception systems, we release the source code and results: https://github.com/tub-rip/ES-PTAM

ES-PTAM: Event-based Stereo Parallel Tracking and Mapping

TL;DR

This work tackles robust visual odometry and SLAM under challenging conditions using event-based stereo cameras. It introduces ES-PTAM, a parallel-tracking-and-mapping system that couples an improved ray-density fusion mapper with an edge-map based tracker, operating directly on event streams. The approach scales to multi-camera setups and is validated on five real-world datasets, including a trinocular EVIMO2 sequence, where it frequently outperforms state-of-the-art ESVO and EVO in pose accuracy and yields sharper semi-dense maps. The authors provide extensive qualitative and quantitative results and release the open-source implementation to foster community advancement in event-based perception. Overall, this work advances purely event-based VO/SLAM, with potential impact for autonomous vehicles and mobile robots operating in HDR and high-speed environments.

Abstract

Visual Odometry (VO) and SLAM are fundamental components for spatial perception in mobile robots. Despite enormous progress in the field, current VO/SLAM systems are limited by their sensors' capability. Event cameras are novel visual sensors that offer advantages to overcome the limitations of standard cameras, enabling robots to expand their operating range to challenging scenarios, such as high-speed motion and high dynamic range illumination. We propose a novel event-based stereo VO system by combining two ideas: a correspondence-free mapping module that estimates depth by maximizing ray density fusion and a tracking module that estimates camera poses by maximizing edge-map alignment. We evaluate the system comprehensively on five real-world datasets, spanning a variety of camera types (manufacturers and spatial resolutions) and scenarios (driving, flying drone, hand-held, egocentric, etc). The quantitative and qualitative results demonstrate that our method outperforms the state of the art in majority of the test sequences by a margin, e.g., trajectory error reduction of 45% on RPG dataset, 61% on DSEC dataset, and 21% on TUM-VIE dataset. To benefit the community and foster research on event-based perception systems, we release the source code and results: https://github.com/tub-rip/ES-PTAM
Paper Structure (19 sections, 4 equations, 7 figures, 4 tables)

This paper contains 19 sections, 4 equations, 7 figures, 4 tables.

Figures (7)

  • Figure 1: (a) 3D point cloud and camera trajectory estimated by our stereo VO pipeline for the TUM-VIE mocap-desk sequence Klenk21iros, along with a snapshot of events, confidence map and the projected point cloud (overlaid on the events). Depth is color-coded from red (near) to blue (far away). (b) Estimated camera poses over time, compared to ground truth (GT) and ESVO Zhou20tro.
  • Figure 2: Proposed event-based stereo visual odometry pipeline consisting of two main modules: camera tracking (i.e., ego-motion estimation) and scene mapping.
  • Figure 3: Our improved ray casting produces clean confidence and depth maps. See its effect near the focus of expansion. Depth is color-coded from red (near) to blue (far).
  • Figure 4: Estimated trajectory of zurich_city_04 sequence from the DSEC dataset vs ground truth. The six evaluation segments (a)-(f) correspond to the rows of \ref{['tab:metrics']}.
  • Figure 5: Results on DSEC using our method and ESVO. The sharper edge maps from our method allow us to track camera poses accurately using edge alignment.
  • ...and 2 more figures