MUST: The First Dataset and Unified Framework for Multispectral UAV Single Object Tracking
Haolin Qin, Tingfa Xu, Tianhao Li, Zhenxiang Chen, Tao Feng, Jianan Li
TL;DR
This work tackles the limitations of RGB UAV tracking under challenging conditions by introducing MUST, the first large-scale multispectral UAV tracking dataset (250 sequences, 43k frames, 8 spectral bands at 1200×900) with 12 challenge attributes. It also presents UNTrack, a Unified Spectral-Spatial-Temporal Tracker built on a Unified Asymmetric Transformer, a spectral background elimination mechanism, and a Spectrum Prompt Encoder, all enhanced by an MSI parameter reconstruction strategy for initialization. UNTrack integrates spectral, spatial, and temporal cues through asymmetric attention that prunes nonessential interactions, updates a spectrum prompt across frames, and outputs precise bounding boxes via a dual-branch head trained with $\mathcal{L}=\mathcal{L}_{cls}+\lambda_1\mathcal{L}_1+\lambda_2\mathcal{L}_{GIoU}$. Empirical results show UNTrack achieves state-of-the-art performance on MUST, plus favorable efficiency and versatility on MSI-based and RGB-based tracking benchmarks, signaling strong potential for real-world multispectral UAV tracking applications.
Abstract
UAV tracking faces significant challenges in real-world scenarios, such as small-size targets and occlusions, which limit the performance of RGB-based trackers. Multispectral images (MSI), which capture additional spectral information, offer a promising solution to these challenges. However, progress in this field has been hindered by the lack of relevant datasets. To address this gap, we introduce the first large-scale Multispectral UAV Single Object Tracking dataset (MUST), which includes 250 video sequences spanning diverse environments and challenges, providing a comprehensive data foundation for multispectral UAV tracking. We also propose a novel tracking framework, UNTrack, which encodes unified spectral, spatial, and temporal features from spectrum prompts, initial templates, and sequential searches. UNTrack employs an asymmetric transformer with a spectral background eliminate mechanism for optimal relationship modeling and an encoder that continuously updates the spectrum prompt to refine tracking, improving both accuracy and efficiency. Extensive experiments show that our proposed UNTrack outperforms state-of-the-art UAV trackers. We believe our dataset and framework will drive future research in this area. The dataset is available on https://github.com/q2479036243/MUST-Multispectral-UAV-Single-Object-Tracking.
