Table of Contents
Fetching ...

Tight Fusion of Events and Inertial Measurements for Direct Velocity Estimation

Wanting Xu, Xin Peng, Laurent Kneip

TL;DR

This work introduces a velocity-centric fusion of event-camera measurements with inertial data to directly estimate camera velocity, avoiding reliance on absolute pose or map coordinates. By employing a continuous event-line constraint (CELC) based on trifocal tensor geometry, it links asynchronous events from moving lines to translational velocity, and uses a nested two-layer RANSAC with a novel 4-event line solver to bootstrap velocity. The back-end then tightly couples event measurements with IMU pre-integration in a sliding window, including a consistency term to enforce cross-slice line coherence. Experiments on synthetic and real data show continuous, robust velocity estimates in highly dynamic conditions, with superior or more stable kinematic estimates than conventional frame-based visual-inertial odometry, especially when motion blur or low texture degrades images. This velocity-first paradigm offers a fail-safe sensor-fusion pathway for fast-acting control in UAVs and dynamic platforms, and opens avenues for integrating lines and points in global mapping while maintaining robust velocity signals.

Abstract

Traditional visual-inertial state estimation targets absolute camera poses and spatial landmark locations while first-order kinematics are typically resolved as an implicitly estimated sub-state. However, this poses a risk in velocity-based control scenarios, as the quality of the estimation of kinematics depends on the stability of absolute camera and landmark coordinates estimation. To address this issue, we propose a novel solution to tight visual-inertial fusion directly at the level of first-order kinematics by employing a dynamic vision sensor instead of a normal camera. More specifically, we leverage trifocal tensor geometry to establish an incidence relation that directly depends on events and camera velocity, and demonstrate how velocity estimates in highly dynamic situations can be obtained over short time intervals. Noise and outliers are dealt with using a nested two-layer RANSAC scheme. Additionally, smooth velocity signals are obtained from a tight fusion with pre-integrated inertial signals using a sliding window optimizer. Experiments on both simulated and real data demonstrate that the proposed tight event-inertial fusion leads to continuous and reliable velocity estimation in highly dynamic scenarios independently of absolute coordinates. Furthermore, in extreme cases, it achieves more stable and more accurate estimation of kinematics than traditional, point-position-based visual-inertial odometry.

Tight Fusion of Events and Inertial Measurements for Direct Velocity Estimation

TL;DR

This work introduces a velocity-centric fusion of event-camera measurements with inertial data to directly estimate camera velocity, avoiding reliance on absolute pose or map coordinates. By employing a continuous event-line constraint (CELC) based on trifocal tensor geometry, it links asynchronous events from moving lines to translational velocity, and uses a nested two-layer RANSAC with a novel 4-event line solver to bootstrap velocity. The back-end then tightly couples event measurements with IMU pre-integration in a sliding window, including a consistency term to enforce cross-slice line coherence. Experiments on synthetic and real data show continuous, robust velocity estimates in highly dynamic conditions, with superior or more stable kinematic estimates than conventional frame-based visual-inertial odometry, especially when motion blur or low texture degrades images. This velocity-first paradigm offers a fail-safe sensor-fusion pathway for fast-acting control in UAVs and dynamic platforms, and opens avenues for integrating lines and points in global mapping while maintaining robust velocity signals.

Abstract

Traditional visual-inertial state estimation targets absolute camera poses and spatial landmark locations while first-order kinematics are typically resolved as an implicitly estimated sub-state. However, this poses a risk in velocity-based control scenarios, as the quality of the estimation of kinematics depends on the stability of absolute camera and landmark coordinates estimation. To address this issue, we propose a novel solution to tight visual-inertial fusion directly at the level of first-order kinematics by employing a dynamic vision sensor instead of a normal camera. More specifically, we leverage trifocal tensor geometry to establish an incidence relation that directly depends on events and camera velocity, and demonstrate how velocity estimates in highly dynamic situations can be obtained over short time intervals. Noise and outliers are dealt with using a nested two-layer RANSAC scheme. Additionally, smooth velocity signals are obtained from a tight fusion with pre-integrated inertial signals using a sliding window optimizer. Experiments on both simulated and real data demonstrate that the proposed tight event-inertial fusion leads to continuous and reliable velocity estimation in highly dynamic scenarios independently of absolute coordinates. Furthermore, in extreme cases, it achieves more stable and more accurate estimation of kinematics than traditional, point-position-based visual-inertial odometry.
Paper Structure (32 sections, 32 equations, 14 figures, 10 tables, 2 algorithms)

This paper contains 32 sections, 32 equations, 14 figures, 10 tables, 2 algorithms.

Figures (14)

  • Figure 1: Example of a local event stream pattern used for camera velocity estimation. Each cluster of events corresponds to a locally observed, real-world line segment.
  • Figure 2: Geometry of CELC which indicates the relationship between events, lines and camera ego-motion. In our work, we extract the intermediate line representations $\mathbf{l}_{sj}$ and $\mathbf{l}_{ej}$ by choosing two events in a small time interval $\Delta t$ at the beginning and at the end of the interval.
  • Figure 3: The geometry of Plücker line coordinates.
  • Figure 4: Geometry of the Plücker coordinates based line solver. The unknowns are the Plücker line, given by $\mathbf{d}$ and $\mathbf{m} = \mathbf{r} \times \mathbf{d}$, $\mathbf{r}$ is the position of the 3D point with respect to $c_s$. The measured or known variables are the landmark observation vector $f_i^{\omega}$ and the position of the camera center $c_i$ at timestamp $t_i$ with respect to $c_s$, given by $\mathbf{t}_{si}$.
  • Figure 5: Flowchart of the proposed inlier metric.
  • ...and 9 more figures