Table of Contents
Fetching ...

Open-World Drone Active Tracking with Goal-Centered Rewards

Haowei Sun, Jinwu Hu, Zhirui Zhang, Haoyuan Tian, Xinze Xie, Yufeng Wang, Xiaohua Xie, Yun Lin, Zhuliang Yu, Mingkui Tan

TL;DR

The paper tackles robust open-world drone active tracking by introducing the DAT benchmark and a reinforcement learning method GC-VAT. DAT provides 24 city-scale, high-fidelity scenes with unlimited scene generation via a digital twin, enabling rigorous evaluation of tracking under diverse, dynamic conditions. GC-VAT uses a Goal-Centered Reward and Curriculum-Based Training to overcome failures of prior distance-based rewards under tilted viewpoints, achieving superior tracking performance in both simulated and real-world tests, including notable sim-to-real transfer. The work advances open-world VAT by delivering a unified benchmark, provable reward design advantages, and practical training strategies with demonstrated robustness and real-world applicability.

Abstract

Drone Visual Active Tracking aims to autonomously follow a target object by controlling the motion system based on visual observations, providing a more practical solution for effective tracking in dynamic environments. However, accurate Drone Visual Active Tracking using reinforcement learning remains challenging due to the absence of a unified benchmark and the complexity of open-world environments with frequent interference. To address these issues, we pioneer a systematic solution. First, we propose DAT, the first open-world drone active air-to-ground tracking benchmark. It encompasses 24 city-scale scenes, featuring targets with human-like behaviors and high-fidelity dynamics simulation. DAT also provides a digital twin tool for unlimited scene generation. Additionally, we propose a novel reinforcement learning method called GC-VAT, which aims to improve the performance of drone tracking targets in complex scenarios. Specifically, we design a Goal-Centered Reward to provide precise feedback across viewpoints to the agent, enabling it to expand perception and movement range through unrestricted perspectives. Inspired by curriculum learning, we introduce a Curriculum-Based Training strategy that progressively enhances the tracking performance in complex environments. Besides, experiments on simulator and real-world images demonstrate the superior performance of GC-VAT, achieving a Tracking Success Rate of approximately 72% on the simulator. The benchmark and code are available at https://github.com/SHWplus/DAT_Benchmark.

Open-World Drone Active Tracking with Goal-Centered Rewards

TL;DR

The paper tackles robust open-world drone active tracking by introducing the DAT benchmark and a reinforcement learning method GC-VAT. DAT provides 24 city-scale, high-fidelity scenes with unlimited scene generation via a digital twin, enabling rigorous evaluation of tracking under diverse, dynamic conditions. GC-VAT uses a Goal-Centered Reward and Curriculum-Based Training to overcome failures of prior distance-based rewards under tilted viewpoints, achieving superior tracking performance in both simulated and real-world tests, including notable sim-to-real transfer. The work advances open-world VAT by delivering a unified benchmark, provable reward design advantages, and practical training strategies with demonstrated robustness and real-world applicability.

Abstract

Drone Visual Active Tracking aims to autonomously follow a target object by controlling the motion system based on visual observations, providing a more practical solution for effective tracking in dynamic environments. However, accurate Drone Visual Active Tracking using reinforcement learning remains challenging due to the absence of a unified benchmark and the complexity of open-world environments with frequent interference. To address these issues, we pioneer a systematic solution. First, we propose DAT, the first open-world drone active air-to-ground tracking benchmark. It encompasses 24 city-scale scenes, featuring targets with human-like behaviors and high-fidelity dynamics simulation. DAT also provides a digital twin tool for unlimited scene generation. Additionally, we propose a novel reinforcement learning method called GC-VAT, which aims to improve the performance of drone tracking targets in complex scenarios. Specifically, we design a Goal-Centered Reward to provide precise feedback across viewpoints to the agent, enabling it to expand perception and movement range through unrestricted perspectives. Inspired by curriculum learning, we introduce a Curriculum-Based Training strategy that progressively enhances the tracking performance in complex environments. Besides, experiments on simulator and real-world images demonstrate the superior performance of GC-VAT, achieving a Tracking Success Rate of approximately 72% on the simulator. The benchmark and code are available at https://github.com/SHWplus/DAT_Benchmark.

Paper Structure

This paper contains 32 sections, 1 theorem, 22 equations, 17 figures, 20 tables, 1 algorithm.

Key Result

Proposition 1

The commonly used Euclidean distance $d(\cdot,\cdot)$ between the target and the image center proposition does not align with the deviation $\phi(\cdot,\cdot)$ of the target from the image center projection, when the camera is not at a fixed horizontal forward viewpoint. That is: where $\phi_i=\phi(P_i,C_g)$, $P_i$ are points in the projection region $\mathcal{I}_p$, $C_g$ is the image center pro

Figures (17)

  • Figure 1: A pipeline for drone VAT.
  • Figure 2: Statistics and simulator component examples of DAT. (a) Statistics on 7 complexity aspects in DAT scenes. (b) Example scenes of DAT. (c) Diverse behaviors of targets. (d) Examples of the tracking targets. More details can be found at https://github.com/SHWplus/DAT_Benchmark.
  • Figure 3: Diagram of reward acquisition.
  • Figure 4: Reward design analysis diagram.
  • Figure 5: Reward values during training.
  • ...and 12 more figures

Theorems & Definitions (1)

  • Proposition 1