Agile Interception of a Flying Target using Competitive Reinforcement Learning

Timothée Gavin; Simon Lacroix; Murat Bronz

Agile Interception of a Flying Target using Competitive Reinforcement Learning

Timothée Gavin, Simon Lacroix, Murat Bronz

Abstract

This article presents a solution to intercept an agile drone by another agile drone carrying a catching net. We formulate the interception as a Competitive Reinforcement Learning problem, where the interceptor and the target drone are controlled by separate policies trained with Proximal Policy Optimization (PPO). We introduce a high-fidelity simulation environment that integrates a realistic quadrotor dynamics model and a low-level control architecture implemented in JAX, which allows for fast parallelized execution on GPUs. We train the agents using low-level control, collective thrust and body rates, to achieve agile flights both for the interceptor and the target. We compare the performance of the trained policies in terms of catch rate, time to catch, and crash rate, against common heuristic baselines and show that our solution outperforms these baselines for interception of agile targets. Finally, we demonstrate the performance of the trained policies in a scaled real-world scenario using agile drones inside an indoor flight arena.

Agile Interception of a Flying Target using Competitive Reinforcement Learning

Abstract

Paper Structure (18 sections, 2 equations, 8 figures, 3 tables)

This paper contains 18 sections, 2 equations, 8 figures, 3 tables.

Introduction
Related-Work on the Agile Interception problem
Agile flight
Interception
Our approach
Agile Flight Simulation Environment
Interception of an Agile Target using Reinforcement Learning
Reinforcement Learning
Pursuit-evasion problem
Observation, actions, and rewards
Training details
Experimental Results
Training Results
Evaluation in Simulation
Qualitative Results in Simulation
...and 3 more sections

Figures (8)

Figure 1: A competitive reinforcement learning approach to train both a pursuer and an evader drone for agile interception tasks. Both agents learn low-level control policies that enable them to perform dynamic maneuvers in a high-fidelity simulation environment.
Figure 2: Control architecture used for the quadrotor dynamics simulation.
Figure 3: Schematic of the interception problem. Capture happens when the distance $d$ between the evader's centre comes within a capture distance of the pursuer's net.
Figure 4: Comparison of learning curves and average episode length.
Figure 5: Evasive manoeuvres: from top left to bottom right, the evader (green) performs a vertical escape, a dive, a sharp turn, and a sudden stop followed by a feint.
...and 3 more figures

Agile Interception of a Flying Target using Competitive Reinforcement Learning

Abstract

Agile Interception of a Flying Target using Competitive Reinforcement Learning

Authors

Abstract

Table of Contents

Figures (8)