Learning Multi-Pursuit Evasion for Safe Targeted Navigation of Drones

Jiaping Xiao; Mir Feroskhan

Learning Multi-Pursuit Evasion for Safe Targeted Navigation of Drones

Jiaping Xiao, Mir Feroskhan

TL;DR

This work tackles safe targeted drone navigation under intelligent, multi-pursuit attacks by formulating MPETN as an adversarial mixed game and solving it with AMS-DRL, a two-stage asynchronous multi-agent DRL framework. A cold-start stage trains the evader to target completion, followed by asynchronous, phase-wise training of a shared chaser policy against the evolving evader, converging toward a Nash equilibrium. Across extensive simulations and physical tests, AMS-DRL outperforms baselines (including APF and PPO variants) and provides insight via a success-rate heatmap about spatial geometry effects, while demonstrating promising Sim2Real transfer on 3x Tello Edu drones. The approach offers a principled, scalable method for robust, adversarially aware drone navigation with potential applicability to other robotic platforms facing intelligent attacks.

Abstract

Safe navigation of drones in the presence of adversarial physical attacks from multiple pursuers is a challenging task. This paper proposes a novel approach, asynchronous multi-stage deep reinforcement learning (AMS-DRL), to train adversarial neural networks that can learn from the actions of multiple evolved pursuers and adapt quickly to their behavior, enabling the drone to avoid attacks and reach its target. Specifically, AMS-DRL evolves adversarial agents in a pursuit-evasion game where the pursuers and the evader are asynchronously trained in a bipartite graph way during multiple stages. Our approach guarantees convergence by ensuring Nash equilibrium among agents from the game-theory analysis. We evaluate our method in extensive simulations and show that it outperforms baselines with higher navigation success rates. We also analyze how parameters such as the relative maximum speed affect navigation performance. Furthermore, we have conducted physical experiments and validated the effectiveness of the trained policies in real-time flights. A success rate heatmap is introduced to elucidate how spatial geometry influences navigation outcomes. Project website: https://github.com/NTU-ICG/AMS-DRL-for-Pursuit-Evasion.

Learning Multi-Pursuit Evasion for Safe Targeted Navigation of Drones

TL;DR

Abstract

Paper Structure (39 sections, 2 theorems, 22 equations, 11 figures, 5 tables, 1 algorithm)

This paper contains 39 sections, 2 theorems, 22 equations, 11 figures, 5 tables, 1 algorithm.

INTRODUCTION
Related Works
Obstacle Avoidance
Pursuit-Evasion Game
Reinforcement Learning with Self-play
Preliminaries
Quadrotor Drone Dynamics
Reinforcement Learning
Markov Decision Process (MDP)
Policy Gradient Theorem
Proximal Policy Optimization Algorithm
Multi-Pursuit Evasion with Targeted Navigation Problem
Problem Formulation
MDP in Adversarial Games
RL Description of MPETN
...and 24 more sections

Key Result

Theorem 1

For the adversarial mixed game $\{\mathcal{M}^r, \mathcal{M}^c\}$, there always exists a unique NE, and this unique NE is the optimal policy $\pi_{\theta}^{r*}$ for the runner agent, i.e., the solution of the pursuit-evasion problem (mixed_game).

Figures (11)

Figure 1: Evading multiple pursuers with learned policy. The runner (evader) is labeled as white, and the two chasers (pursuers) are labeled as blue. The target is a box with AprilTag.
Figure 2: The simulation environment for the 3D multi-pursuit evasion with targeted navigation scenario, which consists of two blue chaser drones (pursuers) and one white runner drone (evader) flying towards the target (red box). All simulated objects are bound with box colliders ($L\times W \times H$) for collision detection. The chasers aim to crash down the runner while the runner is required to safely reach the target.
Figure 3: Illustration of the AMS-DRL training sequence for the shared chaser policy and the runner policy.
Figure 4: Illustration of the neural network architecture for chaser agents and the runner agent.
Figure 5: Average reward of AMS-DRL during the training. The training phases are denoted by $S_0$-$S_1$. (a) Runner policy; (b) Chaser policy.
...and 6 more figures

Theorems & Definitions (6)

Definition 1: Nash Equilibrium of Adversarial Mixed Game
Theorem 1
proof
Remark 1
Theorem 2: Converge Analysis
proof

Learning Multi-Pursuit Evasion for Safe Targeted Navigation of Drones

TL;DR

Abstract

Learning Multi-Pursuit Evasion for Safe Targeted Navigation of Drones

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (11)

Theorems & Definitions (6)