Table of Contents
Fetching ...

Strategizing at Speed: A Learned Model Predictive Game for Multi-Agent Drone Racing

Andrei-Carlo Papuc, Lasse Peters, Sihao Sun, Laura Ferranti, Javier Alonso-Mora

TL;DR

This work analyzes the tension between deep, interaction-aware planning and fast, reactive decision-making in multi-agent drone racing. It shows that while Model Predictive Game (MPG) can outperform contouring MPC at moderate speeds, its benefits erode under latency, prompting the development of Learned Model Predictive Game (LMPG) to amortize computation offline. LMPG uses a differentiable trajectory-optimization layer and offline training to predict Nash strategies with low inference latency (about $3.5$ ms), achieving superior performance in both simulation and real-world races compared to MPG and MPC. The approach offers a practical path to real-time, interaction-aware planning for high-speed aerial robotics, with potential extensions to onboard perception and cluttered environments.

Abstract

Autonomous drone racing pushes the boundaries of high-speed motion planning and multi-agent strategic decision-making. Success in this domain requires drones not only to navigate at their limits but also to anticipate and counteract competitors' actions. In this paper, we study a fundamental question that arises in this domain: how deeply should an agent strategize before taking an action? To this end, we compare two planning paradigms: the Model Predictive Game (MPG), which finds interaction-aware strategies at the expense of longer computation times, and contouring Model Predictive Control (MPC), which computes strategies rapidly but does not reason about interactions. We perform extensive experiments to study this trade-off, revealing that MPG outperforms MPC at moderate velocities but loses its advantage at higher speeds due to latency. To address this shortcoming, we propose a Learned Model Predictive Game (LMPG) approach that amortizes model predictive gameplay to reduce latency. In both simulation and hardware experiments, we benchmark our approach against MPG and MPC in head-to-head races, finding that LMPG outperforms both baselines.

Strategizing at Speed: A Learned Model Predictive Game for Multi-Agent Drone Racing

TL;DR

This work analyzes the tension between deep, interaction-aware planning and fast, reactive decision-making in multi-agent drone racing. It shows that while Model Predictive Game (MPG) can outperform contouring MPC at moderate speeds, its benefits erode under latency, prompting the development of Learned Model Predictive Game (LMPG) to amortize computation offline. LMPG uses a differentiable trajectory-optimization layer and offline training to predict Nash strategies with low inference latency (about ms), achieving superior performance in both simulation and real-world races compared to MPG and MPC. The approach offers a practical path to real-time, interaction-aware planning for high-speed aerial robotics, with potential extensions to onboard perception and cluttered environments.

Abstract

Autonomous drone racing pushes the boundaries of high-speed motion planning and multi-agent strategic decision-making. Success in this domain requires drones not only to navigate at their limits but also to anticipate and counteract competitors' actions. In this paper, we study a fundamental question that arises in this domain: how deeply should an agent strategize before taking an action? To this end, we compare two planning paradigms: the Model Predictive Game (MPG), which finds interaction-aware strategies at the expense of longer computation times, and contouring Model Predictive Control (MPC), which computes strategies rapidly but does not reason about interactions. We perform extensive experiments to study this trade-off, revealing that MPG outperforms MPC at moderate velocities but loses its advantage at higher speeds due to latency. To address this shortcoming, we propose a Learned Model Predictive Game (LMPG) approach that amortizes model predictive gameplay to reduce latency. In both simulation and hardware experiments, we benchmark our approach against MPG and MPC in head-to-head races, finding that LMPG outperforms both baselines.
Paper Structure (22 sections, 7 equations, 5 figures, 2 tables, 1 algorithm)

This paper contains 22 sections, 7 equations, 5 figures, 2 tables, 1 algorithm.

Figures (5)

  • Figure 1: Chronophotography of a real-world autonomous overtaking maneuver on a lemniscate track. The interaction-aware blue drone employs a game-theoretic planner to overtake the red drone through the gate. The red drone utilizes contouring MPC with a constant velocity prediction model, treating the opponent as a dynamic obstacle.
  • Figure 2: (A) Composition of the observations used by the neural network employed by LMPG (c.f. \ref{['eq:lmpg_input']}). (B) Pipeline overview of LMPG: observations are fed into a neural network that embeds differentiable trajectory optimization as a final layer to predict a receding-horizon strategy. (C) A linear MPC tracks the reference strategy via feedback linearization. In simulation, the output of this controller is directly fed into the simulated dynamics. (D) Hierarchical control scheme employed in hardware experiments: using the Agilicious framework Foehn_2022, the drone tracks the receding-horizon strategy by combining a geometric controller with incremental nonlinear dynamic inversion (INDI) sun2022comparative.
  • Figure 3: The three race tracks used for experimental evaluation: (a) Lemniscate, (b) Lissajous, and (c) 3D Lemniscate. The colored lines show sample trajectories from the head-to-head tournament, visualizing common overtaking maneuvers between the competing methods.
  • Figure 4: Head-to-head racing results in simulation and real-life.(A, B) Win rates for MPC vs. MPG in a simulated tournament with synchronous (A) and asynchronous (B) execution modes. (D, E) Win rates for LMPG vs. MPC/MPG in a simulated tournament with synchronous (D) and asynchronous (E) execution modes. (C, F) Win rates for the real-world flight tournament. (G) Solve time distributions for all methods. (H) Visualization of a specific MPG failure case on the lemniscate track: Green annotations highlight a sequence of maneuvers in which (1) MPG trails MPC, (2) MPG overtakes, and (3) after MPC counter-overtakes, MPG fails to solve in time, deviating off-track. Summary: While MPG is competitive in synchronous settings, it suffers from performance-degrading computational delays in asynchronous mode. In contrast, LMPG uses offline training to learn a fast policy unaffected by online delays.
  • Figure 5: Sensitivity of racing performance to latency. At negligible delay, MPG dominates MPC. As artificial delay is added to MPG's strategy (blue line), its performance degrades, allowing MPC to win more races.