Table of Contents
Fetching ...

FE-DeTr: Keypoint Detection and Tracking in Low-quality Image Frames with Events

Xiangyuan Wang, Kuangyi Chen, Wen Yang, Lei Yu, Yannan Xing, Huai Yu

TL;DR

FE-DeTr addresses robust keypoint detection and tracking in low-quality image frames by fusing image frames with event streams. The method introduces a Fusion Feature Extractor, Motion Extractor, and Motion-Aware Head to produce temporally consistent heatmaps across multiple time instants, supervised by a temporal response consistency objective and refined with deformable warping. A Consistency Peaky Loss and a new Extreme Corner dataset enable stable long-term tracking under extreme conditions, demonstrated via extensive experiments against frame-based and event-based baselines. The results show FE-DeTr achieves high localization accuracy and stable tracking, highlighting its potential for robust SLAM and SfM in challenging environments, with future work extending to downstream robotic perception tasks.

Abstract

Keypoint detection and tracking in traditional image frames are often compromised by image quality issues such as motion blur and extreme lighting conditions. Event cameras offer potential solutions to these challenges by virtue of their high temporal resolution and high dynamic range. However, they have limited performance in practical applications due to their inherent noise in event data. This paper advocates fusing the complementary information from image frames and event streams to achieve more robust keypoint detection and tracking. Specifically, we propose a novel keypoint detection network that fuses the textural and structural information from image frames with the high-temporal-resolution motion information from event streams, namely FE-DeTr. The network leverages a temporal response consistency for supervision, ensuring stable and efficient keypoint detection. Moreover, we use a spatio-temporal nearest-neighbor search strategy for robust keypoint tracking. Extensive experiments are conducted on a new dataset featuring both image frames and event data captured under extreme conditions. The experimental results confirm the superior performance of our method over both existing frame-based and event-based methods.

FE-DeTr: Keypoint Detection and Tracking in Low-quality Image Frames with Events

TL;DR

FE-DeTr addresses robust keypoint detection and tracking in low-quality image frames by fusing image frames with event streams. The method introduces a Fusion Feature Extractor, Motion Extractor, and Motion-Aware Head to produce temporally consistent heatmaps across multiple time instants, supervised by a temporal response consistency objective and refined with deformable warping. A Consistency Peaky Loss and a new Extreme Corner dataset enable stable long-term tracking under extreme conditions, demonstrated via extensive experiments against frame-based and event-based baselines. The results show FE-DeTr achieves high localization accuracy and stable tracking, highlighting its potential for robust SLAM and SfM in challenging environments, with future work extending to downstream robotic perception tasks.

Abstract

Keypoint detection and tracking in traditional image frames are often compromised by image quality issues such as motion blur and extreme lighting conditions. Event cameras offer potential solutions to these challenges by virtue of their high temporal resolution and high dynamic range. However, they have limited performance in practical applications due to their inherent noise in event data. This paper advocates fusing the complementary information from image frames and event streams to achieve more robust keypoint detection and tracking. Specifically, we propose a novel keypoint detection network that fuses the textural and structural information from image frames with the high-temporal-resolution motion information from event streams, namely FE-DeTr. The network leverages a temporal response consistency for supervision, ensuring stable and efficient keypoint detection. Moreover, we use a spatio-temporal nearest-neighbor search strategy for robust keypoint tracking. Extensive experiments are conducted on a new dataset featuring both image frames and event data captured under extreme conditions. The experimental results confirm the superior performance of our method over both existing frame-based and event-based methods.
Paper Structure (21 sections, 10 equations, 4 figures, 2 tables)

This paper contains 21 sections, 10 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Our method (bottom) leverages the complementary characteristics of image frames and event streams, allowing for stable keypoint detection and tracking in extreme conditions compared to frame-based methods (top) and event-based methods (middle).
  • Figure 2: Overview of the proposed FE-DeTr. For each frame interval, an event representation is generated and combined with the image frame as input to the keypoint detection network. The network outputs a sequence of uniformly spaced heatmaps.
  • Figure 3: Tracking trajectories comparison under different conditions: Blur (1st row), overexposure (2nd row), dark (3rd row), and HDR (4th row).
  • Figure 4: Reference image frame (left); output heatmap after training without $L_{cp}$ (middle); output heatmap after $L_{cp}$ supervision (right).