Table of Contents
Fetching ...

Learning Two-agent Motion Planning Strategies from Generalized Nash Equilibrium for Model Predictive Control

Hansung Kim, Edward L. Zhu, Chang Seok Lim, Francesco Borrelli

TL;DR

This work tackles real-time two-agent motion planning by learning game-theoretic interaction outcomes from generalized Nash equilibrium data and embedding them as a terminal cost in model predictive control. The method, IGT-MPC, combines offline GNE data generation with a neural network that predicts GT rewards and online MPC that uses this predictor to implicitly account for other agents. It demonstrates two scenarios—competitive head-to-head racing and cooperative intersection navigation—where V_GT-guided MPC achieves higher feasibility, reduces gridlocks, and exhibits strategic behaviors compared to a naive progress-maximizing terminal cost. While effective for two agents, the authors acknowledge scalability and generalization challenges, proposing richer representations and faster solvers as avenues for future work.

Abstract

We introduce an Implicit Game-Theoretic MPC (IGT-MPC), a decentralized algorithm for two-agent motion planning that uses a learned value function that predicts the game-theoretic interaction outcomes as the terminal cost-to-go function in a model predictive control (MPC) framework, guiding agents to implicitly account for interactions with other agents and maximize their reward. This approach applies to competitive and cooperative multi-agent motion planning problems which we formulate as constrained dynamic games. Given a constrained dynamic game, we randomly sample initial conditions and solve for the generalized Nash equilibrium (GNE) to generate a dataset of GNE solutions, computing the reward outcome of each game-theoretic interaction from the GNE. The data is used to train a simple neural network to predict the reward outcome, which we use as the terminal cost-to-go function in an MPC scheme. We showcase emerging competitive and coordinated behaviors using IGT-MPC in scenarios such as two-vehicle head-to-head racing and un-signalized intersection navigation. IGT-MPC offers a novel method integrating machine learning and game-theoretic reasoning into model-based decentralized multi-agent motion planning.

Learning Two-agent Motion Planning Strategies from Generalized Nash Equilibrium for Model Predictive Control

TL;DR

This work tackles real-time two-agent motion planning by learning game-theoretic interaction outcomes from generalized Nash equilibrium data and embedding them as a terminal cost in model predictive control. The method, IGT-MPC, combines offline GNE data generation with a neural network that predicts GT rewards and online MPC that uses this predictor to implicitly account for other agents. It demonstrates two scenarios—competitive head-to-head racing and cooperative intersection navigation—where V_GT-guided MPC achieves higher feasibility, reduces gridlocks, and exhibits strategic behaviors compared to a naive progress-maximizing terminal cost. While effective for two agents, the authors acknowledge scalability and generalization challenges, proposing richer representations and faster solvers as avenues for future work.

Abstract

We introduce an Implicit Game-Theoretic MPC (IGT-MPC), a decentralized algorithm for two-agent motion planning that uses a learned value function that predicts the game-theoretic interaction outcomes as the terminal cost-to-go function in a model predictive control (MPC) framework, guiding agents to implicitly account for interactions with other agents and maximize their reward. This approach applies to competitive and cooperative multi-agent motion planning problems which we formulate as constrained dynamic games. Given a constrained dynamic game, we randomly sample initial conditions and solve for the generalized Nash equilibrium (GNE) to generate a dataset of GNE solutions, computing the reward outcome of each game-theoretic interaction from the GNE. The data is used to train a simple neural network to predict the reward outcome, which we use as the terminal cost-to-go function in an MPC scheme. We showcase emerging competitive and coordinated behaviors using IGT-MPC in scenarios such as two-vehicle head-to-head racing and un-signalized intersection navigation. IGT-MPC offers a novel method integrating machine learning and game-theoretic reasoning into model-based decentralized multi-agent motion planning.

Paper Structure

This paper contains 22 sections, 15 equations, 9 figures, 2 tables, 1 algorithm.

Figures (9)

  • Figure 1: Two-agent interaction scenarios: a) competitive head-to-head racing, and b) cooperative un-signalized two-way intersection navigation, with colored squares showing vehicles' planned trajectories.
  • Figure 2: In a simulation experiment, the slower (green) vehicle is unable to defend its position against the faster (blue) vehicle with $V_{MP}$ (Top). The slower vehicle with $V_{GT}$ successfully defends its position against the faster car with $V_{GT}$ (Bottom). The heatmap represents the level curves to visualize the value functions used in this experiment. The red curve is the raceline and the colored squares and circles are the planned trajectories for vehicles with corresponding colors.
  • Figure 3: Histograms of the lead (in number of car lengths) of the faster vehicle over the slower vehicle at the end of each simulated race over 100 different initial conditions. Green bars indicate bins where the faster vehicle won the race, while red bars indicate losses. The black dashed lines represent average lead values over all simulation runs. Note that vertical scales vary between histograms.
  • Figure 4: In a simulation experiment, Vehicle 1 (green) on route SN and Vehicle 2 (blue) on route ES navigate through an intersection using $V_{GT}$ (Top). Colored squares represent each vehicle's planned trajectories. The middle and bottom rows display the contour of $V_{GT}$ as perceived by Vehicle 1 and 2, respectively. Colored circles indicate current states at each time instance, and colored stars denote the planned terminal state at time $t+N$.
  • Figure 5: In a simulation experiment, Vehicle 1 (green) on route SN and Vehicle 2 (blue) on route ES navigate through an intersection using $V_{MP}$ (Top), reaching a gridlock. Colored squares represent each vehicle's planned trajectories. The middle and bottom rows display the contour of $V_{MP}$ as perceived by Vehicle 1 and 2, respectively. Colored circles indicate current states at each time instance, and colored stars denote the planned terminal state at time $t+N$. The initial conditions are identical to that of in Fig. \ref{['fig:int_GT_example']}
  • ...and 4 more figures

Theorems & Definitions (3)

  • remark 1
  • remark 2
  • remark 3