Table of Contents
Fetching ...

CST Anti-UAV: A Thermal Infrared Benchmark for Tiny UAV Tracking in Complex Scenes

Bin Xie, Congxuan Zhang, Fagan Wang, Peng Liu, Feng Lu, Zhen Chen, Weiming Hu

TL;DR

CST Anti-UAV introduces a large-scale thermal infrared benchmark for single-object tracking of tiny UAVs in complex scenes, addressing gaps in existing datasets by providing complete frame-level attribute annotations across six challenges. The dataset comprises 220 sequences with over 240k high-quality bounding boxes, and data are manually annotated to enable fine-grained evaluation. Benchmarking 20 SOT methods reveals substantial difficulty from tiny targets and dynamic backgrounds, with state-of-the-art performance dropping on CST Anti-UAV compared to prior datasets; training on CST improves performance but reveals the limits of existing methods. The work provides a valuable resource for developing robust anti-UAV trackers and advancing real-world vision-based counter-UAV systems.

Abstract

The widespread application of Unmanned Aerial Vehicles (UAVs) has raised serious public safety and privacy concerns, making UAV perception crucial for anti-UAV tasks. However, existing UAV tracking datasets predominantly feature conspicuous objects and lack diversity in scene complexity and attribute representation, limiting their applicability to real-world scenarios. To overcome these limitations, we present the CST Anti-UAV, a new thermal infrared dataset specifically designed for Single Object Tracking (SOT) in Complex Scenes with Tiny UAVs (CST). It contains 220 video sequences with over 240k high-quality bounding box annotations, highlighting two key properties: a significant number of tiny-sized UAV targets and the diverse and complex scenes. To the best of our knowledge, CST Anti-UAV is the first dataset to incorporate complete manual frame-level attribute annotations, enabling precise evaluations under varied challenges. To conduct an in-depth performance analysis for CST Anti-UAV, we evaluate 20 existing SOT methods on the proposed dataset. Experimental results demonstrate that tracking tiny UAVs in complex environments remains a challenge, as the state-of-the-art method achieves only 35.92% state accuracy, much lower than the 67.69% observed on the Anti-UAV410 dataset. These findings underscore the limitations of existing benchmarks and the need for further advancements in UAV tracking research. The CST Anti-UAV benchmark is about to be publicly released, which not only fosters the development of more robust SOT methods but also drives innovation in anti-UAV systems.

CST Anti-UAV: A Thermal Infrared Benchmark for Tiny UAV Tracking in Complex Scenes

TL;DR

CST Anti-UAV introduces a large-scale thermal infrared benchmark for single-object tracking of tiny UAVs in complex scenes, addressing gaps in existing datasets by providing complete frame-level attribute annotations across six challenges. The dataset comprises 220 sequences with over 240k high-quality bounding boxes, and data are manually annotated to enable fine-grained evaluation. Benchmarking 20 SOT methods reveals substantial difficulty from tiny targets and dynamic backgrounds, with state-of-the-art performance dropping on CST Anti-UAV compared to prior datasets; training on CST improves performance but reveals the limits of existing methods. The work provides a valuable resource for developing robust anti-UAV trackers and advancing real-world vision-based counter-UAV systems.

Abstract

The widespread application of Unmanned Aerial Vehicles (UAVs) has raised serious public safety and privacy concerns, making UAV perception crucial for anti-UAV tasks. However, existing UAV tracking datasets predominantly feature conspicuous objects and lack diversity in scene complexity and attribute representation, limiting their applicability to real-world scenarios. To overcome these limitations, we present the CST Anti-UAV, a new thermal infrared dataset specifically designed for Single Object Tracking (SOT) in Complex Scenes with Tiny UAVs (CST). It contains 220 video sequences with over 240k high-quality bounding box annotations, highlighting two key properties: a significant number of tiny-sized UAV targets and the diverse and complex scenes. To the best of our knowledge, CST Anti-UAV is the first dataset to incorporate complete manual frame-level attribute annotations, enabling precise evaluations under varied challenges. To conduct an in-depth performance analysis for CST Anti-UAV, we evaluate 20 existing SOT methods on the proposed dataset. Experimental results demonstrate that tracking tiny UAVs in complex environments remains a challenge, as the state-of-the-art method achieves only 35.92% state accuracy, much lower than the 67.69% observed on the Anti-UAV410 dataset. These findings underscore the limitations of existing benchmarks and the need for further advancements in UAV tracking research. The CST Anti-UAV benchmark is about to be publicly released, which not only fosters the development of more robust SOT methods but also drives innovation in anti-UAV systems.

Paper Structure

This paper contains 13 sections, 1 equation, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Examples of sequence clips from the CST Anti-UAV dataset. The UAV is annotated with the red bounding box. The most notable feature of CST Anti-UAV is complex scenes, as shown at the bottom, including the occlusion (C), complex dynamic background (D), scale variation (S), thermal crossover (T), and out-of-view (V). The background covers buildings (B), and urban areas (U).
  • Figure 2: Main features and statistics of the proposed dataset. (a) Distribution of scene (outer) and object size (inner). (b) A comparison of the sequence-level attributes of existing anti-UAV tracking datasets and our proposed dataset, emphasizing their significance and applicability for practical anti-UAV tasks in terms of video count. The numbers from Zero to Five on the table represent the number of attributes contained in a sequence. (c) Comparative analysis of the relative size distributions of objects in Anti-UAV, Anti-UAV410, and CST Anti-UAV. (d) Statistics of frame-level attributes in CST Anti-UAV. The area of each cycle denotes the number of total frames.
  • Figure 3: The statistical distributions of the training set, validation set, and test set for each attribute are shown. The attribute distributions across the three subsets exhibit the same trends.
  • Figure 4: Frame-level and sequence-level attribute SA results compared to overall SA results in terms of fluctuation.
  • Figure 5: Scale evaluation on Anti-UAV410 test set. Each column specifies the training dataset used.