Table of Contents
Fetching ...

AsynEVO: Asynchronous Event-Driven Visual Odometry for Pure Event Streams

Zhixiang Wang, Xudong Li, Yizhai Zhang, Panfeng Huang

TL;DR

This work tackles the challenge of estimating ego-motion from pure event streams at high temporal resolution. It introduces AsynEVO, a two-part system comprising an asynchronous event-driven frontend and a dynamic sliding-window GP backend on $SE(3)$, coupled with a dynamic marginalization strategy to preserve sparsity. Empirical results on public datasets and real-world flights show competitive accuracy, improved robustness, and substantially lower runtime than incremental GP-based approaches, particularly for high-speed motion and HDR environments. The method advances practical pure-event visual odometry and paves the way for real-time sensor fusion with stereo and inertial extensions.

Abstract

Event cameras are bio-inspired vision sensors that asynchronously measure per-pixel brightness changes.The high-temporal resolution and asynchronicity of event cameras offer great potential for estimating robot motion states. Recent works have adopted the continuous-time estimation methods to exploit the inherent nature of event cameras. However, existing methods either have poor runtime performance or neglect the high-temporal resolution of event cameras. To alleviate it, an Asynchronous Event-driven Visual Odometry (AsynEVO) based on sparse Gaussian Process (GP) regression is proposed to efficiently infer the motion trajectory from pure event streams. Concretely, an asynchronous frontend pipeline is designed to adapt event-driven feature tracking and manage feature trajectories; a parallel dynamic sliding-window backend is presented within the framework of sparse GP regression on $SE(3)$. Notably, a dynamic marginalization strategy is employed to ensure the consistency and sparsity of this GP regression. Experiments conducted on public datasets and real-world scenarios demonstrate that AsynEVO achieves competitive precision and superior robustness compared to the state-of-the-art.The experiment in the repeated-texture scenario indicates that the high-temporal resolution of AsynEVO plays a vital role in the estimation of high-speed movement. Furthermore, we show that the computational efficiency of AsynEVO significantly outperforms the incremental method.

AsynEVO: Asynchronous Event-Driven Visual Odometry for Pure Event Streams

TL;DR

This work tackles the challenge of estimating ego-motion from pure event streams at high temporal resolution. It introduces AsynEVO, a two-part system comprising an asynchronous event-driven frontend and a dynamic sliding-window GP backend on , coupled with a dynamic marginalization strategy to preserve sparsity. Empirical results on public datasets and real-world flights show competitive accuracy, improved robustness, and substantially lower runtime than incremental GP-based approaches, particularly for high-speed motion and HDR environments. The method advances practical pure-event visual odometry and paves the way for real-time sensor fusion with stereo and inertial extensions.

Abstract

Event cameras are bio-inspired vision sensors that asynchronously measure per-pixel brightness changes.The high-temporal resolution and asynchronicity of event cameras offer great potential for estimating robot motion states. Recent works have adopted the continuous-time estimation methods to exploit the inherent nature of event cameras. However, existing methods either have poor runtime performance or neglect the high-temporal resolution of event cameras. To alleviate it, an Asynchronous Event-driven Visual Odometry (AsynEVO) based on sparse Gaussian Process (GP) regression is proposed to efficiently infer the motion trajectory from pure event streams. Concretely, an asynchronous frontend pipeline is designed to adapt event-driven feature tracking and manage feature trajectories; a parallel dynamic sliding-window backend is presented within the framework of sparse GP regression on . Notably, a dynamic marginalization strategy is employed to ensure the consistency and sparsity of this GP regression. Experiments conducted on public datasets and real-world scenarios demonstrate that AsynEVO achieves competitive precision and superior robustness compared to the state-of-the-art.The experiment in the repeated-texture scenario indicates that the high-temporal resolution of AsynEVO plays a vital role in the estimation of high-speed movement. Furthermore, we show that the computational efficiency of AsynEVO significantly outperforms the incremental method.
Paper Structure (16 sections, 13 equations, 10 figures, 1 table, 1 algorithm)

This paper contains 16 sections, 13 equations, 10 figures, 1 table, 1 algorithm.

Figures (10)

  • Figure 1: System pipeline. The first thread inputs event measurements, tracks with asynchronous tracking method, and outputs feature trajectories to a queue. The second thread receives feature trajectories, tries triangulating them, and adds them into the backend graph.
  • Figure 2: Illustration of feature trajectories. When the event camera observes some landmarks and moves in the scenario, the event stream triggered by the same landmark will be managed as a feature trajectory.
  • Figure 3: Factor graph in dynamic sliding-window. (a) The original factor graph. (b) The factor graph that add new landmarks and states. The marginalized zone is marked with red shade. Notice that the node surrounded by a white ring inside the marginalized zone is not marginalized. (c) Marginalized factor graph.
  • Figure 4: Repeated-texture scenario and estimation result. The event camera translates in $9\ m/s$. The AsynEVO can estimate the true motion correctly while the frame-based visual odometry gets a wrong result for its limited temporal resolution. Note that the velocity of DSO is calculated by differencing neighbor poses.
  • Figure 5: Experimental scenarios and intermediate results. Scenarios and event measurements are visualized in (A). Dynamic 6dof and Indoor flying1 come from public datasets which are captured by DAVIS event cameras. (B) Sparse feature points are detected and tracked by the proposed frontend, which forms a mass of feature trajectories in 3D time-pixel-plane coordinates. The estimated motion trajectories and corresponding ground-truth are illustrated in (C) The pointcloud as an intermediate product is also shown in (C).
  • ...and 5 more figures