Table of Contents
Fetching ...

Multi-step Temporal Modeling for UAV Tracking

Xiaoying Yuan, Tingfa Xu, Xincong Liu, Ying Wang, Haolin Qin, Yuqiang Fang, Jianan Li

TL;DR

MT-Track is introduced, a streamlined and efficient multi-step temporal modeling framework designed to harness the temporal context from historical frames for enhanced UAV tracking and proposes a mutual transformer module to refine the correlation maps of historical and current frames by modeling the temporal knowledge in the tracking sequence.

Abstract

In the realm of unmanned aerial vehicle (UAV) tracking, Siamese-based approaches have gained traction due to their optimal balance between efficiency and precision. However, UAV scenarios often present challenges such as insufficient sampling resolution, fast motion and small objects with limited feature information. As a result, temporal context in UAV tracking tasks plays a pivotal role in target location, overshadowing the target's precise features. In this paper, we introduce MT-Track, a streamlined and efficient multi-step temporal modeling framework designed to harness the temporal context from historical frames for enhanced UAV tracking. This temporal integration occurs in two steps: correlation map generation and correlation map refinement. Specifically, we unveil a unique temporal correlation module that dynamically assesses the interplay between the template and search region features. This module leverages temporal information to refresh the template feature, yielding a more precise correlation map. Subsequently, we propose a mutual transformer module to refine the correlation maps of historical and current frames by modeling the temporal knowledge in the tracking sequence. This method significantly trims computational demands compared to the raw transformer. The compact yet potent nature of our tracking framework ensures commendable tracking outcomes, particularly in extended tracking scenarios.

Multi-step Temporal Modeling for UAV Tracking

TL;DR

MT-Track is introduced, a streamlined and efficient multi-step temporal modeling framework designed to harness the temporal context from historical frames for enhanced UAV tracking and proposes a mutual transformer module to refine the correlation maps of historical and current frames by modeling the temporal knowledge in the tracking sequence.

Abstract

In the realm of unmanned aerial vehicle (UAV) tracking, Siamese-based approaches have gained traction due to their optimal balance between efficiency and precision. However, UAV scenarios often present challenges such as insufficient sampling resolution, fast motion and small objects with limited feature information. As a result, temporal context in UAV tracking tasks plays a pivotal role in target location, overshadowing the target's precise features. In this paper, we introduce MT-Track, a streamlined and efficient multi-step temporal modeling framework designed to harness the temporal context from historical frames for enhanced UAV tracking. This temporal integration occurs in two steps: correlation map generation and correlation map refinement. Specifically, we unveil a unique temporal correlation module that dynamically assesses the interplay between the template and search region features. This module leverages temporal information to refresh the template feature, yielding a more precise correlation map. Subsequently, we propose a mutual transformer module to refine the correlation maps of historical and current frames by modeling the temporal knowledge in the tracking sequence. This method significantly trims computational demands compared to the raw transformer. The compact yet potent nature of our tracking framework ensures commendable tracking outcomes, particularly in extended tracking scenarios.
Paper Structure (20 sections, 12 equations, 14 figures, 6 tables)

This paper contains 20 sections, 12 equations, 14 figures, 6 tables.

Figures (14)

  • Figure 1: (a) Overview of MT-Track. The proposed temporal modeling module exploits temporal information at two steps: correlation map generation by temporal correlation, and correlation map refinement by mutual transformer. (b) Accuracy-speed trade-off on DTB70. Our MT-Track achieves competitive performance with impressive efficiency due to the full use of temporal information.
  • Figure 2: (a) Overall architecture of MT-Track. First, the features extracted from the backbone are fed into temporal correlation to dynamically update template feature and produce accurate correlation maps. The Multi-Template Fusion module (MTF) is described in detail in Fig. \ref{['temcor']}. Next, mutual transformer accepts the correlation maps to model temporal knowledge, and produces the refined correlation maps. (b) Mutual Transformer. It is an encoder-decoder architecture with parallel historical and current branches. The mutual attention (MA) is introduced to mutually refine the correlation maps.
  • Figure 3: Workflow of Temporal Correlation. The Multi-Template Fusion (MTF) updates the template feature by fusing the temporal information of the feature sequences. The updated template feature is then used to create the correlation map by depth-wise correlation.
  • Figure 4: Overall tracking performance. Success and Precision plots of all trackers on DTB70dtb, UAV123UAV123 and UAV123@10fpsUAV123. Our tracker achieves leading performance against most SoTA trackers.
  • Figure 5: Attribute-based evaluation. Success plots with attributes of all trackers on DTB70, UAV123, and UAV123_10fps. It shows MT-Track can maintain promising performance under deformation, camera motion, background clutter, and similar object compared with other 12 trackers.
  • ...and 9 more figures