Table of Contents
Fetching ...

LEAR: Learning Edge-Aware Representations for Event-to-LiDAR Localization

Kuangyi Chen, Jun Zhang, Yuxi Hu, Yi Zhou, Friedrich Fraundorfer

TL;DR

LEAR is proposed, a dual-task learning framework that jointly estimates edge structures and dense event-depth flow fields to bridge the sensing-modality divide and achieves superior performance over the best prior method on several popular and challenging datasets.

Abstract

Event cameras offer high-temporal-resolution sensing that remains reliable under high-speed motion and challenging lighting, making them promising for localization from LiDAR point clouds in GPS-denied and visually degraded environments. However, aligning sparse, asynchronous events with dense LiDAR maps is fundamentally ill-posed, as direct correspondence estimation suffers from modality gaps. We propose LEAR, a dual-task learning framework that jointly estimates edge structures and dense event-depth flow fields to bridge the sensing-modality divide. Instead of treating edges as a post-hoc aid, LEAR couples them with flow estimation through a cross-modal fusion mechanism that injects modality-invariant geometric cues into the motion representation, and an iterative refinement strategy that enforces mutual consistency between the two tasks over multiple update steps. This synergy produces edge-aware, depth-aligned flow fields that enable more robust and accurate pose recovery via Perspective-n-Point (PnP) solvers. On several popular and challenging datasets, LEAR achieves superior performance over the best prior method. The source code, trained models, and demo videos are made publicly available online.

LEAR: Learning Edge-Aware Representations for Event-to-LiDAR Localization

TL;DR

LEAR is proposed, a dual-task learning framework that jointly estimates edge structures and dense event-depth flow fields to bridge the sensing-modality divide and achieves superior performance over the best prior method on several popular and challenging datasets.

Abstract

Event cameras offer high-temporal-resolution sensing that remains reliable under high-speed motion and challenging lighting, making them promising for localization from LiDAR point clouds in GPS-denied and visually degraded environments. However, aligning sparse, asynchronous events with dense LiDAR maps is fundamentally ill-posed, as direct correspondence estimation suffers from modality gaps. We propose LEAR, a dual-task learning framework that jointly estimates edge structures and dense event-depth flow fields to bridge the sensing-modality divide. Instead of treating edges as a post-hoc aid, LEAR couples them with flow estimation through a cross-modal fusion mechanism that injects modality-invariant geometric cues into the motion representation, and an iterative refinement strategy that enforces mutual consistency between the two tasks over multiple update steps. This synergy produces edge-aware, depth-aligned flow fields that enable more robust and accurate pose recovery via Perspective-n-Point (PnP) solvers. On several popular and challenging datasets, LEAR achieves superior performance over the best prior method. The source code, trained models, and demo videos are made publicly available online.
Paper Structure (20 sections, 8 equations, 7 figures, 3 tables)

This paper contains 20 sections, 8 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Our method (bottom) integrates a flow estimator with an edge detector within a mutually reinforcing cycle, enabling more accurate 2D–3D matching (i.e., event–depth flow) compared to the flow-only baseline EVLocchen2025evloc (top).
  • Figure 2: The three stages of the proposed method: 1) Image Generation: a depth map is generated using a pinhole camera model $D$ based on a given initial pose guess, and an event image is constructed from the raw event stream using an event image generator $E$; 2) Encoder: the depth map and event image are fed into the flow estimation branch (green module), and the depth map also serves as input to the edge detection branch (red module). Encoded depth features $F_D$ and edge features $F_{ED}$ are fused across scales via the CFF module to produce edge-aware representations; 3) Iterative Edge and Flow Prediction: these, along with event features $F_{EV}$ and correlation volumes $F_C$ (constructed by indexing the element-wise products of $F_D$ and $F_{EV}$ using the current flow; see Fig. \ref{['fig:IFR']}), are passed to the IFR module, where flow $\boldsymbol{f}$ and edge representations $F_{ED}$ are refined iteratively and jointly, ultimately producing flow estimates $\boldsymbol{f}^{1\rightarrow N}$ and edge estimates $p^{1\rightarrow N}$ from the corresponding decoders. The final flow estimate $\boldsymbol{f}^{N}$ is used to recover the camera pose via a PnP solver within a RANSAC loop. The yellow circles in the IFR module denote specific operations. See Sec. \ref{['sec:ifr']} and Fig. \ref{['fig:IFR']} for details.
  • Figure 3: Architecture of the CFF module. Both the edge detection and flow estimation branches employ five-layer encoders with different spatial resolutions at each layer. Feature fusion is performed at the middle three layers, as illustrated in the diagram.
  • Figure 4: Visualization of the encoded depth feature map. Each pixel value in the visualized feature map indicates the mean value of the corresponding feature vector at that location. The depth feature map after applying the CFF module contains richer texture details that align with structures.
  • Figure 5: Architecture of the IFR module. Edge features ($F_{ED}^{0} \to F_{ED}^{N}$) and flow estimates ($\boldsymbol{f}^{0} \to \boldsymbol{f}^{N}$) are progressively refined through mutual reinforcement. W, L, and C denote warping, indexing and concatenating, respectively.
  • ...and 2 more figures