Data-Driven Feature Tracking for Event Cameras With and Without Frames
Nico Messikommer, Carter Fang, Mathias Gehrig, Giovanni Cioffi, Davide Scaramuzza
TL;DR
This work introduces the first data-driven feature tracker for event cameras, addressing frame-based limitations by leveraging the high temporal resolution of events. A novel frame attention module enables information sharing across all tracks, and the system supports both event-only and hybrid event-frame deployments, including aligned and side-by-side stereo configurations for sparse disparity estimation. Training combines synthetic supervision from Multiflow with pose-based self-supervision to bridge sim-to-real gaps, enabling robust performance across EC and EDS datasets. The results show superior tracking performance and significant runtime advantages over state-of-the-art baselines, with extendability to disparity estimation and integration with frame-based trackers for robust VO/SLAM pipelines.
Abstract
Because of their high temporal resolution, increased resilience to motion blur, and very sparse output, event cameras have been shown to be ideal for low-latency and low-bandwidth feature tracking, even in challenging scenarios. Existing feature tracking methods for event cameras are either handcrafted or derived from first principles but require extensive parameter tuning, are sensitive to noise, and do not generalize to different scenarios due to unmodeled effects. To tackle these deficiencies, we introduce the first data-driven feature tracker for event cameras, which leverages low-latency events to track features detected in an intensity frame. We achieve robust performance via a novel frame attention module, which shares information across feature tracks. Our tracker is designed to operate in two distinct configurations: solely with events or in a hybrid mode incorporating both events and frames. The hybrid model offers two setups: an aligned configuration where the event and frame cameras share the same viewpoint, and a hybrid stereo configuration where the event camera and the standard camera are positioned side-by-side. This side-by-side arrangement is particularly valuable as it provides depth information for each feature track, enhancing its utility in applications such as visual odometry and simultaneous localization and mapping.
