Open-World Drone Active Tracking with Goal-Centered Rewards
Haowei Sun, Jinwu Hu, Zhirui Zhang, Haoyuan Tian, Xinze Xie, Yufeng Wang, Xiaohua Xie, Yun Lin, Zhuliang Yu, Mingkui Tan
TL;DR
The paper tackles robust open-world drone active tracking by introducing the DAT benchmark and a reinforcement learning method GC-VAT. DAT provides 24 city-scale, high-fidelity scenes with unlimited scene generation via a digital twin, enabling rigorous evaluation of tracking under diverse, dynamic conditions. GC-VAT uses a Goal-Centered Reward and Curriculum-Based Training to overcome failures of prior distance-based rewards under tilted viewpoints, achieving superior tracking performance in both simulated and real-world tests, including notable sim-to-real transfer. The work advances open-world VAT by delivering a unified benchmark, provable reward design advantages, and practical training strategies with demonstrated robustness and real-world applicability.
Abstract
Drone Visual Active Tracking aims to autonomously follow a target object by controlling the motion system based on visual observations, providing a more practical solution for effective tracking in dynamic environments. However, accurate Drone Visual Active Tracking using reinforcement learning remains challenging due to the absence of a unified benchmark and the complexity of open-world environments with frequent interference. To address these issues, we pioneer a systematic solution. First, we propose DAT, the first open-world drone active air-to-ground tracking benchmark. It encompasses 24 city-scale scenes, featuring targets with human-like behaviors and high-fidelity dynamics simulation. DAT also provides a digital twin tool for unlimited scene generation. Additionally, we propose a novel reinforcement learning method called GC-VAT, which aims to improve the performance of drone tracking targets in complex scenarios. Specifically, we design a Goal-Centered Reward to provide precise feedback across viewpoints to the agent, enabling it to expand perception and movement range through unrestricted perspectives. Inspired by curriculum learning, we introduce a Curriculum-Based Training strategy that progressively enhances the tracking performance in complex environments. Besides, experiments on simulator and real-world images demonstrate the superior performance of GC-VAT, achieving a Tracking Success Rate of approximately 72% on the simulator. The benchmark and code are available at https://github.com/SHWplus/DAT_Benchmark.
