FE-DeTr: Keypoint Detection and Tracking in Low-quality Image Frames with Events
Xiangyuan Wang, Kuangyi Chen, Wen Yang, Lei Yu, Yannan Xing, Huai Yu
TL;DR
FE-DeTr addresses robust keypoint detection and tracking in low-quality image frames by fusing image frames with event streams. The method introduces a Fusion Feature Extractor, Motion Extractor, and Motion-Aware Head to produce temporally consistent heatmaps across multiple time instants, supervised by a temporal response consistency objective and refined with deformable warping. A Consistency Peaky Loss and a new Extreme Corner dataset enable stable long-term tracking under extreme conditions, demonstrated via extensive experiments against frame-based and event-based baselines. The results show FE-DeTr achieves high localization accuracy and stable tracking, highlighting its potential for robust SLAM and SfM in challenging environments, with future work extending to downstream robotic perception tasks.
Abstract
Keypoint detection and tracking in traditional image frames are often compromised by image quality issues such as motion blur and extreme lighting conditions. Event cameras offer potential solutions to these challenges by virtue of their high temporal resolution and high dynamic range. However, they have limited performance in practical applications due to their inherent noise in event data. This paper advocates fusing the complementary information from image frames and event streams to achieve more robust keypoint detection and tracking. Specifically, we propose a novel keypoint detection network that fuses the textural and structural information from image frames with the high-temporal-resolution motion information from event streams, namely FE-DeTr. The network leverages a temporal response consistency for supervision, ensuring stable and efficient keypoint detection. Moreover, we use a spatio-temporal nearest-neighbor search strategy for robust keypoint tracking. Extensive experiments are conducted on a new dataset featuring both image frames and event data captured under extreme conditions. The experimental results confirm the superior performance of our method over both existing frame-based and event-based methods.
