Table of Contents
Fetching ...

GL-DT: Multi-UAV Detection and Tracking with Global-Local Integration

Juanqin Liu, Leonardo Plotegher, Eloy Roura, Shaoming He

TL;DR

GL-DT tackles real-time multi-UAV tracking in challenging aerial scenes by jointly modeling motion and appearance through a Spatio-Temporal Feature Fusion (STFF) and a frame-level global–local detection strategy. The framework combines Global Detection with an enhanced AM-YOLO backbone and Local Detection on ROI trajectories, coupled with the JPTrack tracker that uses JCMA for robust data association and PMR to recover trajectories after short-term occlusions. Experiments on MOT-FLY and FT datasets show state-of-the-art performance in ID continuity and localization accuracy (e.g., high IDF1 and MOTA) while maintaining real-time speed on both PC (≈124 FPS with TensorRT) and embedded platforms (≈25 FPS on Xavier NX). These results underscore GL-DT’s practical viability for UAV-based surveillance and autonomous navigation, especially in scenarios with small, cluttered, and interacting drones.

Abstract

The extensive application of unmanned aerial vehicles (UAVs) in military reconnaissance, environmental monitoring, and related domains has created an urgent need for accurate and efficient multi-object tracking (MOT) technologies, which are also essential for UAV situational awareness. However, complex backgrounds, small-scale targets, and frequent occlusions and interactions continue to challenge existing methods in terms of detection accuracy and trajectory continuity. To address these issues, this paper proposes the Global-Local Detection and Tracking (GL-DT) framework. It employs a Spatio-Temporal Feature Fusion (STFF) module to jointly model motion and appearance features, combined with a global-local collaborative detection strategy, effectively enhancing small-target detection. Building upon this, the JPTrack tracking algorithm is introduced to mitigate common issues such as ID switches and trajectory fragmentation. Experimental results demonstrate that the proposed approach significantly improves the continuity and stability of MOT while maintaining real-time performance, providing strong support for the advancement of UAV detection and tracking technologies.

GL-DT: Multi-UAV Detection and Tracking with Global-Local Integration

TL;DR

GL-DT tackles real-time multi-UAV tracking in challenging aerial scenes by jointly modeling motion and appearance through a Spatio-Temporal Feature Fusion (STFF) and a frame-level global–local detection strategy. The framework combines Global Detection with an enhanced AM-YOLO backbone and Local Detection on ROI trajectories, coupled with the JPTrack tracker that uses JCMA for robust data association and PMR to recover trajectories after short-term occlusions. Experiments on MOT-FLY and FT datasets show state-of-the-art performance in ID continuity and localization accuracy (e.g., high IDF1 and MOTA) while maintaining real-time speed on both PC (≈124 FPS with TensorRT) and embedded platforms (≈25 FPS on Xavier NX). These results underscore GL-DT’s practical viability for UAV-based surveillance and autonomous navigation, especially in scenarios with small, cluttered, and interacting drones.

Abstract

The extensive application of unmanned aerial vehicles (UAVs) in military reconnaissance, environmental monitoring, and related domains has created an urgent need for accurate and efficient multi-object tracking (MOT) technologies, which are also essential for UAV situational awareness. However, complex backgrounds, small-scale targets, and frequent occlusions and interactions continue to challenge existing methods in terms of detection accuracy and trajectory continuity. To address these issues, this paper proposes the Global-Local Detection and Tracking (GL-DT) framework. It employs a Spatio-Temporal Feature Fusion (STFF) module to jointly model motion and appearance features, combined with a global-local collaborative detection strategy, effectively enhancing small-target detection. Building upon this, the JPTrack tracking algorithm is introduced to mitigate common issues such as ID switches and trajectory fragmentation. Experimental results demonstrate that the proposed approach significantly improves the continuity and stability of MOT while maintaining real-time performance, providing strong support for the advancement of UAV detection and tracking technologies.

Paper Structure

This paper contains 24 sections, 29 equations, 8 figures, 4 tables.

Figures (8)

  • Figure 1: Some typical examples of UAVs: the first row shows small, low-visibility UAV targets, while the second row illustrates UAVs in complex backgrounds, where targets are difficult to distinguish from the environment.
  • Figure 2: The overall architecture of the proposed GL-DT framework, comprising global–local collaborative detection (GD and LD) and a Tracking module. GD utilizes AM-YOLO for the global detection, LD employs YOLO11s-P2 for local detection, and JPTrack ensures stable multi-object tracking.
  • Figure 3: AM-YOLO Model Architecture: Utilizes dual-frame input and incorporates a customized STFF module for multi-scale spatial-temporal feature extraction.
  • Figure 4: Framework of the STFF module and Motion-aware attention module. STFF integrates spatio-temporal features via Motion-aware attention and feature fusion, while the attention module extracts appearance and motion features through local window operations.
  • Figure 5: JPTrack Tracking Flowchart: Composed of JCMA and PMR Modules. The JCMA constructs a comprehensive matching cost by fusing multi-dimensional features including IoU, distance, motion consistency, and geometric relationships; the PMR module probabilistically models the historical states of lost trajectories using Gaussian Mixture Models to achieve trajectory recovery after short-term occlusions.
  • ...and 3 more figures