EVIT: Event-based Visual-Inertial Tracking in Semi-Dense Maps Using Windowed Nonlinear Optimization
Runze Yuan, Tao Liu, Zijia Dai, Yi-Fan Zuo, Laurent Kneip
TL;DR
EVIT addresses robust ego-motion tracking for event cameras when a prior semi-dense map is available, targeting challenging dynamics and illumination. The method fuses IMU pre-integration with time-surface-map (TSM) based edge alignment in a sliding-window, nonlinear optimization, treating multiple keyframes as a virtual elastic multi-camera rig with inter-frame constraints. Key contributions include adaptive keyframe generation, a time-surface-based event representation, tightly coupled visual-inertial initialization, and a sliding-window back-end that integrates IMU and TSM observations. Experimental results on the VECtor dataset show EVIT outperforms purely event-based methods, especially in dynamic sequences, while reducing the rate of intermediate event registrations and maintaining real-time capability; the approach generalizes beyond event cameras to regular cameras through similar windowed tracking.
Abstract
Event cameras are an interesting visual exteroceptive sensor that reacts to brightness changes rather than integrating absolute image intensities. Owing to this design, the sensor exhibits strong performance in situations of challenging dynamics and illumination conditions. While event-based simultaneous tracking and mapping remains a challenging problem, a number of recent works have pointed out the sensor's suitability for prior map-based tracking. By making use of cross-modal registration paradigms, the camera's ego-motion can be tracked across a large spectrum of illumination and dynamics conditions on top of accurate maps that have been created a priori by more traditional sensors. The present paper follows up on a recently introduced event-based geometric semi-dense tracking paradigm, and proposes the addition of inertial signals in order to robustify the estimation. More specifically, the added signals provide strong cues for pose initialization as well as regularization during windowed, multi-frame tracking. As a result, the proposed framework achieves increased performance under challenging illumination conditions as well as a reduction of the rate at which intermediate event representations need to be registered in order to maintain stable tracking across highly dynamic sequences. Our evaluation focuses on a diverse set of real world sequences and comprises a comparison of our proposed method against a purely event-based alternative running at different rates.
