Diffusion Models for Multi-target Adversarial Tracking
Sean Ye, Manisha Natarajan, Zixuan Wu, Matthew Gombolay
TL;DR
This paper tackles adversarial target tracking under partial observability in large-scale domains. It introduces CADENCE, a diffusion-model approach with cross-attention and constraint-guided sampling to generate multimodal, obstacle-aware trajectories from sparse detections. Key contributions include (1) a cross-attention diffusion architecture enabling implicit target assignment for multi-target tracking, (2) a constraint-guided sampling scheme that enforces motion and obstacle constraints and reduces mountain collisions by about 90%, and (3) a single-target diffusion model that improves $ADE$ by about 9.2% over the previous state of the art while producing complete trajectories across horizons. The method offers significant potential for autonomous tracking in UAV-enabled security and safety applications, providing richer foresight for pursuit and interdiction strategies while operating under partial observability.
Abstract
Target tracking plays a crucial role in real-world scenarios, particularly in drug-trafficking interdiction, where the knowledge of an adversarial target's location is often limited. Improving autonomous tracking systems will enable unmanned aerial, surface, and underwater vehicles to better assist in interdicting smugglers that use manned surface, semi-submersible, and aerial vessels. As unmanned drones proliferate, accurate autonomous target estimation is even more crucial for security and safety. This paper presents Constrained Agent-based Diffusion for Enhanced Multi-Agent Tracking (CADENCE), an approach aimed at generating comprehensive predictions of adversary locations by leveraging past sparse state information. To assess the effectiveness of this approach, we evaluate predictions on single-target and multi-target pursuit environments, employing Monte-Carlo sampling of the diffusion model to estimate the probability associated with each generated trajectory. We propose a novel cross-attention based diffusion model that utilizes constraint-based sampling to generate multimodal track hypotheses. Our single-target model surpasses the performance of all baseline methods on Average Displacement Error (ADE) for predictions across all time horizons.
