Progressive Representation Learning for Real-Time UAV Tracking
Changhong Fu, Xiang Lei, Haobo Zuo, Liangliang Yao, Guangze Zheng, Jia Pan
TL;DR
This work tackles robust real-time UAV object tracking under challenging dynamics such as occlusion and aspect-ratio changes. It introduces PRL-Track, a progressive coarse-to-fine framework that combines CNN-based coarse representations with a ViT-based fine representation, enabled by an appearance-aware regulator, a semantic-aware regulator, and a Hierarchical Modeling Generator that uses hierarchical cross-attention. The approach yields state-of-the-art results on UAVTrack112, UAVTrack112_L, and UAV123, while running at 42.6 FPS on an edge platform, demonstrating practical applicability. These results suggest that integrating local CNN features with global ViT modeling in a progressive, interaction-rich architecture can significantly improve both robustness and speed for UAV tracking in real-world environments.
Abstract
Visual object tracking has significantly promoted autonomous applications for unmanned aerial vehicles (UAVs). However, learning robust object representations for UAV tracking is especially challenging in complex dynamic environments, when confronted with aspect ratio change and occlusion. These challenges severely alter the original information of the object. To handle the above issues, this work proposes a novel progressive representation learning framework for UAV tracking, i.e., PRL-Track. Specifically, PRL-Track is divided into coarse representation learning and fine representation learning. For coarse representation learning, two innovative regulators, which rely on appearance and semantic information, are designed to mitigate appearance interference and capture semantic information. Furthermore, for fine representation learning, a new hierarchical modeling generator is developed to intertwine coarse object representations. Exhaustive experiments demonstrate that the proposed PRL-Track delivers exceptional performance on three authoritative UAV tracking benchmarks. Real-world tests indicate that the proposed PRL-Track realizes superior tracking performance with 42.6 frames per second on the typical UAV platform equipped with an edge smart camera. The code, model, and demo videos are available at \url{https://github.com/vision4robotics/PRL-Track}.
