Tracking Any Point with Frame-Event Fusion Network at High Frame Rate

Jiaxiong Liu; Bo Wang; Zhen Tan; Jinpu Zhang; Hui Shen; Dewen Hu

Tracking Any Point with Frame-Event Fusion Network at High Frame Rate

Jiaxiong Liu, Bo Wang, Zhen Tan, Jinpu Zhang, Hui Shen, Dewen Hu

TL;DR

An image-event fusion point tracker, FE-TAP, which combines the contextual information from image frames with the high temporal resolution of events, achieving high frame rate and robust point tracking under various challenging conditions is proposed.

Abstract

Tracking any point based on image frames is constrained by frame rates, leading to instability in high-speed scenarios and limited generalization in real-world applications. To overcome these limitations, we propose an image-event fusion point tracker, FE-TAP, which combines the contextual information from image frames with the high temporal resolution of events, achieving high frame rate and robust point tracking under various challenging conditions. Specifically, we designed an Evolution Fusion module (EvoFusion) to model the image generation process guided by events. This module can effectively integrate valuable information from both modalities operating at different frequencies. To achieve smoother point trajectories, we employed a transformer-based refinement strategy that updates the point's trajectories and features iteratively. Extensive experiments demonstrate that our method outperforms state-of-the-art approaches, particularly improving expected feature age by 24$\%$ on EDS datasets. Finally, we qualitatively validated the robustness of our algorithm in real driving scenarios using our custom-designed high-resolution image-event synchronization device. Our source code will be released at https://github.com/ljx1002/FE-TAP.

Tracking Any Point with Frame-Event Fusion Network at High Frame Rate

TL;DR

Abstract

on EDS datasets. Finally, we qualitatively validated the robustness of our algorithm in real driving scenarios using our custom-designed high-resolution image-event synchronization device. Our source code will be released at https://github.com/ljx1002/FE-TAP.

Paper Structure (16 sections, 3 equations, 5 figures, 2 tables)

This paper contains 16 sections, 3 equations, 5 figures, 2 tables.

INTRODUCTION
RELATED WORK
Frame-Based methods
Event-Based methods
METHOD
Event representation
EvoFusion
Query Preparation
Iterative Refinement
Experiments
Implementation Details
Datasets and Metrics
Result Comparisons
Ablation Study
Results in Driving Scenarios
...and 1 more sections

Figures (5)

Figure 1: Comparison of tracking performance in high-speed motion scenarios: Our method (top right), integrating image and event data, vs. Data-driven methods (top left), which rely on the first image frame and event data.
Figure 2: The overview of FE-TAP. EvoFusion module fuses image and event data with different frame rates using an appropriate data selection strategy. The query preparation module computes cost volumes based on the fused feature maps. The iterative update module takes these elements as input and optimizes all point query trajectories in parallel within a sliding window, producing high-frequency point tracks.
Figure 3: Qualitative tracking predictions(red) and ground truth tracks(green) for EC dataset (1st, 2nd col) and EDS dataset (3rd, 4th col). We discard predicted trajectories if they deviate significantly from the ground truth trajectory.
Figure 4: The comparison of our method and data-driven conf_cvpr_MessikommerFG023 under occlusions
Figure 5: (a) Custom-designed image-event synchronization device; We validated the performance of our tracker in real-world driving scenarios, including urban roads (b) and tunnel (c) environments.

Tracking Any Point with Frame-Event Fusion Network at High Frame Rate

TL;DR

Abstract

Tracking Any Point with Frame-Event Fusion Network at High Frame Rate

Authors

TL;DR

Abstract

Table of Contents

Figures (5)