Markov Potential Game and Multi-Agent Reinforcement Learning for Autonomous Driving

Huiwen Yan; Mushuang Liu

Markov Potential Game and Multi-Agent Reinforcement Learning for Autonomous Driving

Huiwen Yan, Mushuang Liu

Abstract

Autonomous driving (AD) requires safe and reliable decision-making among interacting agents, e.g., vehicles, bicycles, and pedestrians. Multi-agent reinforcement learning (MARL) modeled by Markov games (MGs) provides a suitable framework to characterize such agents' interactions during decision-making. Nash equilibria (NEs) are often the desired solution in an MG. However, it is typically challenging to compute an NE in general-sum games, unless the game is a Markov potential game (MPG), which ensures the NE attainability under a few learning algorithms such as gradient play. However, it has been an open question how to construct an MPG and whether these construction rules are suitable for AD applications. In this paper, we provide sufficient conditions under which an MG is an MPG and show that these conditions can accommodate general driving objectives for autonomous vehicles (AVs) using highway forced merge scenarios as illustrative examples. A parameter-sharing neural network (NN) structure is designed to enable decentralized policy execution. The trained driving policy from MPGs is evaluated in both simulated and naturalistic traffic datasets. Comparative studies with single-agent RL and with human drivers whose behaviors are recorded in the traffic datasets are reported, respectively.

Markov Potential Game and Multi-Agent Reinforcement Learning for Autonomous Driving

Abstract

Paper Structure (13 sections, 7 theorems, 42 equations, 10 figures, 4 tables, 1 algorithm)

This paper contains 13 sections, 7 theorems, 42 equations, 10 figures, 4 tables, 1 algorithm.

Introduction
Problem formulation
Markov Games
Multi-agent reinforcement learning
Markov potential game
Definition and Properties of MPGs
Construction of MPGs
Multi-agent reinforcement learning design
Numerical results in highway forced merge scenarios
Simulation setup
Numerical results with simulated surrounding vehicles
Verification on the naturalistic driving dataset
Conclusion

Key Result

Lemma 1

(Gradient dominationgradientPlay) For the direct distributed parameterization eq:direct, the following inequality holds for any $\theta=(\theta_1,\cdots,\theta_N)\in \mathcal{X}$ and any $\theta_i'\in\mathcal{X}_i,i\in\mathcal{N}$: where$\|\frac{d_{\theta'}}{d_\theta}\|_{\infty}\coloneqq\max_s\frac{d_{\theta'}(s)}{d_{\theta}(s)}$, and $\theta'=(\theta_i',\theta_{-i})$.

Figures (10)

Figure 1: Parameter-sharing policy network architecture.
Figure 2: Single-lane highway forced merge scenario.
Figure 3: Convergence of the mean potential function during training.
Figure 4: Convergence of the mean squared action difference during training.
Figure 5: Illustration of the merging behavior in Scenario 1 of the simulated data. (a) Initial state. (b) Ego vehicle before merging. (c) Ego vehicle after merging.
...and 5 more figures

Theorems & Definitions (16)

Definition 1
Definition 2
Lemma 1
Theorem 1
Definition 3
Theorem 2
Theorem 3
Theorem 4
proof
Remark 1
...and 6 more

Markov Potential Game and Multi-Agent Reinforcement Learning for Autonomous Driving

Abstract

Markov Potential Game and Multi-Agent Reinforcement Learning for Autonomous Driving

Authors

Abstract

Table of Contents

Key Result

Figures (10)

Theorems & Definitions (16)