Table of Contents
Fetching ...

Markov Potential Game and Multi-Agent Reinforcement Learning for Autonomous Driving

Huiwen Yan, Mushuang Liu

Abstract

Autonomous driving (AD) requires safe and reliable decision-making among interacting agents, e.g., vehicles, bicycles, and pedestrians. Multi-agent reinforcement learning (MARL) modeled by Markov games (MGs) provides a suitable framework to characterize such agents' interactions during decision-making. Nash equilibria (NEs) are often the desired solution in an MG. However, it is typically challenging to compute an NE in general-sum games, unless the game is a Markov potential game (MPG), which ensures the NE attainability under a few learning algorithms such as gradient play. However, it has been an open question how to construct an MPG and whether these construction rules are suitable for AD applications. In this paper, we provide sufficient conditions under which an MG is an MPG and show that these conditions can accommodate general driving objectives for autonomous vehicles (AVs) using highway forced merge scenarios as illustrative examples. A parameter-sharing neural network (NN) structure is designed to enable decentralized policy execution. The trained driving policy from MPGs is evaluated in both simulated and naturalistic traffic datasets. Comparative studies with single-agent RL and with human drivers whose behaviors are recorded in the traffic datasets are reported, respectively.

Markov Potential Game and Multi-Agent Reinforcement Learning for Autonomous Driving

Abstract

Autonomous driving (AD) requires safe and reliable decision-making among interacting agents, e.g., vehicles, bicycles, and pedestrians. Multi-agent reinforcement learning (MARL) modeled by Markov games (MGs) provides a suitable framework to characterize such agents' interactions during decision-making. Nash equilibria (NEs) are often the desired solution in an MG. However, it is typically challenging to compute an NE in general-sum games, unless the game is a Markov potential game (MPG), which ensures the NE attainability under a few learning algorithms such as gradient play. However, it has been an open question how to construct an MPG and whether these construction rules are suitable for AD applications. In this paper, we provide sufficient conditions under which an MG is an MPG and show that these conditions can accommodate general driving objectives for autonomous vehicles (AVs) using highway forced merge scenarios as illustrative examples. A parameter-sharing neural network (NN) structure is designed to enable decentralized policy execution. The trained driving policy from MPGs is evaluated in both simulated and naturalistic traffic datasets. Comparative studies with single-agent RL and with human drivers whose behaviors are recorded in the traffic datasets are reported, respectively.
Paper Structure (13 sections, 7 theorems, 42 equations, 10 figures, 4 tables, 1 algorithm)

This paper contains 13 sections, 7 theorems, 42 equations, 10 figures, 4 tables, 1 algorithm.

Key Result

Lemma 1

(Gradient dominationgradientPlay) For the direct distributed parameterization eq:direct, the following inequality holds for any $\theta=(\theta_1,\cdots,\theta_N)\in \mathcal{X}$ and any $\theta_i'\in\mathcal{X}_i,i\in\mathcal{N}$: where$\|\frac{d_{\theta'}}{d_\theta}\|_{\infty}\coloneqq\max_s\frac{d_{\theta'}(s)}{d_{\theta}(s)}$, and $\theta'=(\theta_i',\theta_{-i})$.

Figures (10)

  • Figure 1: Parameter-sharing policy network architecture.
  • Figure 2: Single-lane highway forced merge scenario.
  • Figure 3: Convergence of the mean potential function during training.
  • Figure 4: Convergence of the mean squared action difference during training.
  • Figure 5: Illustration of the merging behavior in Scenario 1 of the simulated data. (a) Initial state. (b) Ego vehicle before merging. (c) Ego vehicle after merging.
  • ...and 5 more figures

Theorems & Definitions (16)

  • Definition 1
  • Definition 2
  • Lemma 1
  • Theorem 1
  • Definition 3
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • proof
  • Remark 1
  • ...and 6 more