Table of Contents
Fetching ...

EVLoc: Event-based Visual Localization in LiDAR Maps via Event-Depth Registration

Kuangyi Chen, Jun Zhang, Friedrich Fraundorfer

TL;DR

EVLoc addresses robust 6-DoF localization by aligning event frames with depth maps derived from existing LiDAR maps using a RAFT-based event-depth flow estimator. It introduces a frame-based event representation (Temporal-Spatial stable Time Surface, TSTS) and an Offset Alleviation Module to compensate ground-truth bias, enabling reliable 2D-3D correspondences for PnP pose estimation under challenging motion and lighting ($Δt$ window). The approach yields accurate pose refinement on indoor/outdoor LiDAR-map sequences, outperforming a conventional image-based baseline in high dynamic range and motion scenarios. By relying on LiDAR maps as references and providing open-source code and models, EVLoc enhances scalability and practical deployment for autonomous systems.

Abstract

Event cameras are bio-inspired sensors with some notable features, including high dynamic range and low latency, which makes them exceptionally suitable for perception in challenging scenarios such as high-speed motion and extreme lighting conditions. In this paper, we explore their potential for localization within pre-existing LiDAR maps, a critical task for applications that require precise navigation and mobile manipulation. Our framework follows a paradigm based on the refinement of an initial pose. Specifically, we first project LiDAR points into 2D space based on a rough initial pose to obtain depth maps, and then employ an optical flow estimation network to align events with LiDAR points in 2D space, followed by camera pose estimation using a PnP solver. To enhance geometric consistency between these two inherently different modalities, we develop a novel frame-based event representation that improves structural clarity. Additionally, given the varying degrees of bias observed in the ground truth poses, we design a module that predicts an auxiliary variable as a regularization term to mitigate the impact of this bias on network convergence. Experimental results on several public datasets demonstrate the effectiveness of our proposed method. To facilitate future research, both the code and the pre-trained models are made available online.

EVLoc: Event-based Visual Localization in LiDAR Maps via Event-Depth Registration

TL;DR

EVLoc addresses robust 6-DoF localization by aligning event frames with depth maps derived from existing LiDAR maps using a RAFT-based event-depth flow estimator. It introduces a frame-based event representation (Temporal-Spatial stable Time Surface, TSTS) and an Offset Alleviation Module to compensate ground-truth bias, enabling reliable 2D-3D correspondences for PnP pose estimation under challenging motion and lighting ( window). The approach yields accurate pose refinement on indoor/outdoor LiDAR-map sequences, outperforming a conventional image-based baseline in high dynamic range and motion scenarios. By relying on LiDAR maps as references and providing open-source code and models, EVLoc enhances scalability and practical deployment for autonomous systems.

Abstract

Event cameras are bio-inspired sensors with some notable features, including high dynamic range and low latency, which makes them exceptionally suitable for perception in challenging scenarios such as high-speed motion and extreme lighting conditions. In this paper, we explore their potential for localization within pre-existing LiDAR maps, a critical task for applications that require precise navigation and mobile manipulation. Our framework follows a paradigm based on the refinement of an initial pose. Specifically, we first project LiDAR points into 2D space based on a rough initial pose to obtain depth maps, and then employ an optical flow estimation network to align events with LiDAR points in 2D space, followed by camera pose estimation using a PnP solver. To enhance geometric consistency between these two inherently different modalities, we develop a novel frame-based event representation that improves structural clarity. Additionally, given the varying degrees of bias observed in the ground truth poses, we design a module that predicts an auxiliary variable as a regularization term to mitigate the impact of this bias on network convergence. Experimental results on several public datasets demonstrate the effectiveness of our proposed method. To facilitate future research, both the code and the pre-trained models are made available online.

Paper Structure

This paper contains 18 sections, 10 equations, 6 figures, 2 tables, 1 algorithm.

Figures (6)

  • Figure 1: Overview of the proposed EVLoc. We assume that a coarse initial pose guess is available to serve as the starting point for precise localization. We project the 3D LiDAR map into 2D space to generate a depth map based on this initial pose. Simultaneously, events within a fixed time interval $\Delta t$ are converted into an event frame. The event frame and depth map are then input into the flow estimator to obtain the event-depth flow, which is used to warp the encoded depth features. These warped depth features, along with the encoded event features, are fed into the Offset Alleviation Module (OAM) to predict the auxiliary variable. Finally, a PnP solver calculates the camera pose from the 2D-3D correspondences obtained from the estimated flow.
  • Figure 2: Comparison of resulting event frames from TSzhu2019unsupervised, SILCmanderscheid2019speed, and our proposed TSTS.
  • Figure 3: Offset exists between the event frame and the corresponding depth map generated based on the ground truth pose (in M3EDChaney_2023_CVPR).
  • Figure 4: Overview of the devised offset alleviation module.
  • Figure 5: Rotation
  • ...and 1 more figures