FocusTrack: A Self-Adaptive Local Sampling Algorithm for Efficient Anti-UAV Tracking
Ying Wang, Tingfa Xu, Jianan Li
TL;DR
FocusTrack tackles Anti-UAV tracking by dynamically balancing local efficiency with global robustness through two key innovations: a Search Region Adjustment (SRA) that estimates target presence to adapt the field of view, and an Attention-to-Mask (ATM) module that fuses hierarchical features into fine-grained masks. Built on a Transformer-based local tracking backbone, the approach integrates a CLS_Token for presence estimation, a contrastive sampling strategy, and multi-layer cross-attention to stabilize tracking under abrupt camera motion and cluttered infrared backgrounds. Empirical results on AntiUAV and AntiUAV410 show state-of-the-art performance with favorable efficiency—achieving 143 fps in the light variant and 44 fps with full ATM—outperforming several local trackers while remaining more efficient than global re-detection methods. Overall, FocusTrack offers a practical, real-time Anti-UAV tracking solution that combines adaptive search behavior with refined feature representations, with potential extensions to segmentation-guided robustness and richer data sources.
Abstract
Anti-UAV tracking poses significant challenges, including small target sizes, abrupt camera motion, and cluttered infrared backgrounds. Existing tracking paradigms can be broadly categorized into global- and local-based methods. Global-based trackers, such as SiamDT, achieve high accuracy by scanning the entire field of view but suffer from excessive computational overhead, limiting real-world deployment. In contrast, local-based methods, including OSTrack and ROMTrack, efficiently restrict the search region but struggle when targets undergo significant displacements due to abrupt camera motion. Through preliminary experiments, it is evident that a local tracker, when paired with adaptive search region adjustment, can significantly enhance tracking accuracy, narrowing the gap between local and global trackers. To address this challenge, we propose FocusTrack, a novel framework that dynamically refines the search region and strengthens feature representations, achieving an optimal balance between computational efficiency and tracking accuracy. Specifically, our Search Region Adjustment (SRA) strategy estimates the target presence probability and adaptively adjusts the field of view, ensuring the target remains within focus. Furthermore, to counteract feature degradation caused by varying search regions, the Attention-to-Mask (ATM) module is proposed. This module integrates hierarchical information, enriching the target representations with fine-grained details. Experimental results demonstrate that FocusTrack achieves state-of-the-art performance, obtaining 67.7% AUC on AntiUAV and 62.8% AUC on AntiUAV410, outperforming the baseline tracker by 8.5% and 9.1% AUC, respectively. In terms of efficiency, FocusTrack surpasses global-based trackers, requiring only 30G MACs and achieving 143 fps with FocusTrack (SRA) and 44 fps with the full version, both enabling real-time tracking.
