ETAP: Event-based Tracking of Any Point
Friedhelm Hamann, Daniel Gehrig, Filbert Febryanto, Kostas Daniilidis, Guillermo Gallego
TL;DR
ETAP introduces the first purely event-based method for tracking arbitrary points (TAP) using event cameras, addressing limitations of frame-based sensors under challenging lighting and high-speed motion. The approach combines event-stack representations, multi-scale features, and a transformer-based refinement loop with a novel motion-robust feature-alignment loss that enforces descriptor consistency across time and motion. A new synthetic dataset, EventKubric, together with ground-truth for EVIMO2 and E2D2, enables large-scale training and rigorous evaluation, achieving state-of-the-art performance on TAP and feature-tracking benchmarks, with cross-dataset generalization to diverse camera types and resolutions. The results demonstrate strong tracking, occlusion handling, and feature stability in challenging regimes, highlighting the practicality of purely event-based tracking for robotics and perception in HDR/high-speed scenarios.
Abstract
Tracking any point (TAP) recently shifted the motion estimation paradigm from focusing on individual salient points with local templates to tracking arbitrary points with global image contexts. However, while research has mostly focused on driving the accuracy of models in nominal settings, addressing scenarios with difficult lighting conditions and high-speed motions remains out of reach due to the limitations of the sensor. This work addresses this challenge with the first event camera-based TAP method. It leverages the high temporal resolution and high dynamic range of event cameras for robust high-speed tracking, and the global contexts in TAP methods to handle asynchronous and sparse event measurements. We further extend the TAP framework to handle event feature variations induced by motion -- thereby addressing an open challenge in purely event-based tracking -- with a novel feature-alignment loss which ensures the learning of motion-robust features. Our method is trained with data from a new data generation pipeline and systematically ablated across all design decisions. Our method shows strong cross-dataset generalization and performs 136% better on the average Jaccard metric than the baselines. Moreover, on an established feature tracking benchmark, it achieves a 20% improvement over the previous best event-only method and even surpasses the previous best events-and-frames method by 4.1%. Our code is available at https://github.com/tub-rip/ETAP
