Table of Contents
Fetching ...

Progressive Representation Learning for Real-Time UAV Tracking

Changhong Fu, Xiang Lei, Haobo Zuo, Liangliang Yao, Guangze Zheng, Jia Pan

TL;DR

This work tackles robust real-time UAV object tracking under challenging dynamics such as occlusion and aspect-ratio changes. It introduces PRL-Track, a progressive coarse-to-fine framework that combines CNN-based coarse representations with a ViT-based fine representation, enabled by an appearance-aware regulator, a semantic-aware regulator, and a Hierarchical Modeling Generator that uses hierarchical cross-attention. The approach yields state-of-the-art results on UAVTrack112, UAVTrack112_L, and UAV123, while running at 42.6 FPS on an edge platform, demonstrating practical applicability. These results suggest that integrating local CNN features with global ViT modeling in a progressive, interaction-rich architecture can significantly improve both robustness and speed for UAV tracking in real-world environments.

Abstract

Visual object tracking has significantly promoted autonomous applications for unmanned aerial vehicles (UAVs). However, learning robust object representations for UAV tracking is especially challenging in complex dynamic environments, when confronted with aspect ratio change and occlusion. These challenges severely alter the original information of the object. To handle the above issues, this work proposes a novel progressive representation learning framework for UAV tracking, i.e., PRL-Track. Specifically, PRL-Track is divided into coarse representation learning and fine representation learning. For coarse representation learning, two innovative regulators, which rely on appearance and semantic information, are designed to mitigate appearance interference and capture semantic information. Furthermore, for fine representation learning, a new hierarchical modeling generator is developed to intertwine coarse object representations. Exhaustive experiments demonstrate that the proposed PRL-Track delivers exceptional performance on three authoritative UAV tracking benchmarks. Real-world tests indicate that the proposed PRL-Track realizes superior tracking performance with 42.6 frames per second on the typical UAV platform equipped with an edge smart camera. The code, model, and demo videos are available at \url{https://github.com/vision4robotics/PRL-Track}.

Progressive Representation Learning for Real-Time UAV Tracking

TL;DR

This work tackles robust real-time UAV object tracking under challenging dynamics such as occlusion and aspect-ratio changes. It introduces PRL-Track, a progressive coarse-to-fine framework that combines CNN-based coarse representations with a ViT-based fine representation, enabled by an appearance-aware regulator, a semantic-aware regulator, and a Hierarchical Modeling Generator that uses hierarchical cross-attention. The approach yields state-of-the-art results on UAVTrack112, UAVTrack112_L, and UAV123, while running at 42.6 FPS on an edge platform, demonstrating practical applicability. These results suggest that integrating local CNN features with global ViT modeling in a progressive, interaction-rich architecture can significantly improve both robustness and speed for UAV tracking in real-world environments.

Abstract

Visual object tracking has significantly promoted autonomous applications for unmanned aerial vehicles (UAVs). However, learning robust object representations for UAV tracking is especially challenging in complex dynamic environments, when confronted with aspect ratio change and occlusion. These challenges severely alter the original information of the object. To handle the above issues, this work proposes a novel progressive representation learning framework for UAV tracking, i.e., PRL-Track. Specifically, PRL-Track is divided into coarse representation learning and fine representation learning. For coarse representation learning, two innovative regulators, which rely on appearance and semantic information, are designed to mitigate appearance interference and capture semantic information. Furthermore, for fine representation learning, a new hierarchical modeling generator is developed to intertwine coarse object representations. Exhaustive experiments demonstrate that the proposed PRL-Track delivers exceptional performance on three authoritative UAV tracking benchmarks. Real-world tests indicate that the proposed PRL-Track realizes superior tracking performance with 42.6 frames per second on the typical UAV platform equipped with an edge smart camera. The code, model, and demo videos are available at \url{https://github.com/vision4robotics/PRL-Track}.
Paper Structure (23 sections, 9 equations, 7 figures, 2 tables)

This paper contains 23 sections, 9 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Overall comparison of the proposed PRL-Track with other 14 state-of-the-art (SOTA) trackers on the combination of UAV tracking benchmarks. PRL-Track achieves more robust performance than other 14 SOTA trackers. Specifically, PRL-Track surpasses the average precision and success rate of the 14 trackers (black dot) by 7.8% and 14.1%, respectively.
  • Figure 2: Illustration of the proposed progressive representation learning framework for UAV tracking. In the coarse representation learning, the appearance-aware regulator and semantic-aware regulator are employed to generate coarse object representations, which highlight different features of the image. In the fine representation learning, the coarse object representations are first patched, then projected, split, and reassembled to obtain $\textbf{M}_3$, $\textbf{M}_4$, and $\textbf{M}_5$ respectively, followed by fusion via hierarchical cross-attention. Best viewed in color (Image frames are from UAV123 mueller2016benchmark).
  • Figure 3: Structure of the proposed AR (above) and SR (below). The AR is designed to mitigate appearance inference, while the SR is designed to capture semantic information.
  • Figure 4: Detailed workflow of the proposed HMG. With interaction operation and cross-attention, the $\textbf{QKV}$ pairings with different hierarchies, i.e., $\textbf{M}_3$, $\textbf{M}_4$, and $\textbf{M}_5$, can communicate with each other. Best viewed in color.
  • Figure 5: Overall performance of PRL-Track and SOTA trackers on UAVTrack112 fu2021onboard, UAVTrack112_L fu2021onboard, and UAV123 mueller2016benchmark. The experimental results showcase the superior performance of the proposed PRL-Track on all benchmarks.
  • ...and 2 more figures