Table of Contents
Fetching ...

FocusTrack: A Self-Adaptive Local Sampling Algorithm for Efficient Anti-UAV Tracking

Ying Wang, Tingfa Xu, Jianan Li

TL;DR

FocusTrack tackles Anti-UAV tracking by dynamically balancing local efficiency with global robustness through two key innovations: a Search Region Adjustment (SRA) that estimates target presence to adapt the field of view, and an Attention-to-Mask (ATM) module that fuses hierarchical features into fine-grained masks. Built on a Transformer-based local tracking backbone, the approach integrates a CLS_Token for presence estimation, a contrastive sampling strategy, and multi-layer cross-attention to stabilize tracking under abrupt camera motion and cluttered infrared backgrounds. Empirical results on AntiUAV and AntiUAV410 show state-of-the-art performance with favorable efficiency—achieving 143 fps in the light variant and 44 fps with full ATM—outperforming several local trackers while remaining more efficient than global re-detection methods. Overall, FocusTrack offers a practical, real-time Anti-UAV tracking solution that combines adaptive search behavior with refined feature representations, with potential extensions to segmentation-guided robustness and richer data sources.

Abstract

Anti-UAV tracking poses significant challenges, including small target sizes, abrupt camera motion, and cluttered infrared backgrounds. Existing tracking paradigms can be broadly categorized into global- and local-based methods. Global-based trackers, such as SiamDT, achieve high accuracy by scanning the entire field of view but suffer from excessive computational overhead, limiting real-world deployment. In contrast, local-based methods, including OSTrack and ROMTrack, efficiently restrict the search region but struggle when targets undergo significant displacements due to abrupt camera motion. Through preliminary experiments, it is evident that a local tracker, when paired with adaptive search region adjustment, can significantly enhance tracking accuracy, narrowing the gap between local and global trackers. To address this challenge, we propose FocusTrack, a novel framework that dynamically refines the search region and strengthens feature representations, achieving an optimal balance between computational efficiency and tracking accuracy. Specifically, our Search Region Adjustment (SRA) strategy estimates the target presence probability and adaptively adjusts the field of view, ensuring the target remains within focus. Furthermore, to counteract feature degradation caused by varying search regions, the Attention-to-Mask (ATM) module is proposed. This module integrates hierarchical information, enriching the target representations with fine-grained details. Experimental results demonstrate that FocusTrack achieves state-of-the-art performance, obtaining 67.7% AUC on AntiUAV and 62.8% AUC on AntiUAV410, outperforming the baseline tracker by 8.5% and 9.1% AUC, respectively. In terms of efficiency, FocusTrack surpasses global-based trackers, requiring only 30G MACs and achieving 143 fps with FocusTrack (SRA) and 44 fps with the full version, both enabling real-time tracking.

FocusTrack: A Self-Adaptive Local Sampling Algorithm for Efficient Anti-UAV Tracking

TL;DR

FocusTrack tackles Anti-UAV tracking by dynamically balancing local efficiency with global robustness through two key innovations: a Search Region Adjustment (SRA) that estimates target presence to adapt the field of view, and an Attention-to-Mask (ATM) module that fuses hierarchical features into fine-grained masks. Built on a Transformer-based local tracking backbone, the approach integrates a CLS_Token for presence estimation, a contrastive sampling strategy, and multi-layer cross-attention to stabilize tracking under abrupt camera motion and cluttered infrared backgrounds. Empirical results on AntiUAV and AntiUAV410 show state-of-the-art performance with favorable efficiency—achieving 143 fps in the light variant and 44 fps with full ATM—outperforming several local trackers while remaining more efficient than global re-detection methods. Overall, FocusTrack offers a practical, real-time Anti-UAV tracking solution that combines adaptive search behavior with refined feature representations, with potential extensions to segmentation-guided robustness and richer data sources.

Abstract

Anti-UAV tracking poses significant challenges, including small target sizes, abrupt camera motion, and cluttered infrared backgrounds. Existing tracking paradigms can be broadly categorized into global- and local-based methods. Global-based trackers, such as SiamDT, achieve high accuracy by scanning the entire field of view but suffer from excessive computational overhead, limiting real-world deployment. In contrast, local-based methods, including OSTrack and ROMTrack, efficiently restrict the search region but struggle when targets undergo significant displacements due to abrupt camera motion. Through preliminary experiments, it is evident that a local tracker, when paired with adaptive search region adjustment, can significantly enhance tracking accuracy, narrowing the gap between local and global trackers. To address this challenge, we propose FocusTrack, a novel framework that dynamically refines the search region and strengthens feature representations, achieving an optimal balance between computational efficiency and tracking accuracy. Specifically, our Search Region Adjustment (SRA) strategy estimates the target presence probability and adaptively adjusts the field of view, ensuring the target remains within focus. Furthermore, to counteract feature degradation caused by varying search regions, the Attention-to-Mask (ATM) module is proposed. This module integrates hierarchical information, enriching the target representations with fine-grained details. Experimental results demonstrate that FocusTrack achieves state-of-the-art performance, obtaining 67.7% AUC on AntiUAV and 62.8% AUC on AntiUAV410, outperforming the baseline tracker by 8.5% and 9.1% AUC, respectively. In terms of efficiency, FocusTrack surpasses global-based trackers, requiring only 30G MACs and achieving 143 fps with FocusTrack (SRA) and 44 fps with the full version, both enabling real-time tracking.

Paper Structure

This paper contains 38 sections, 3 equations, 13 figures, 8 tables, 1 algorithm.

Figures (13)

  • Figure 1: Illustration of camera motion challenge in Anti-UAV tracking. (a) and (b) show the target’s horizontal position over time in two AntiUAV410antiuav410 sequences: (a) with smooth motion and (b) with abrupt displacements. (c) visualizes a displacement at frame 430 in (b), where the green box marks the target and the red box the search region. From frames 431–433, FocusTrack adaptively adjusts the region to successfully reacquire the target.
  • Figure 2: Preliminary experiments on OSTrack ostrack to explore the impact of search factors. The red solid line shows tracking performance (AUC) on AntiUAV410, while the blue dashed line represents the relative feature proportion.
  • Figure 3: (a) Overall structure of FocusTrack. (b) Detailed explanation of the Adaptive Search Region Adjustment Strategy. FocusTrack takes the template, search region, and a learnable CLS_Token as inputs, processing them through Patch Embedding and multiple self-attention layers for feature extraction. The extracted features then pass sequentially through the Search Region Adjustment, Attention-to-Mask, and Bounding Box Estimation modules, ultimately producing the target presence probability, segmentation mask, classification score map, and predicted bounding box.
  • Figure 4: Detailed explanation of the Contrastive Frame Sampling Strategy.
  • Figure 5: (a) Overall structure of ATM Module. (b) Detailed structure of a single ATM block. The ATM module consists of $M$ stacked ATM blocks, taking a learnable category query and multi-layer search tokens as input to generate the target segmentation mask.
  • ...and 8 more figures