Table of Contents
Fetching ...

Markov Potential Game Construction and Multi-Agent Reinforcement Learning with Applications to Autonomous Driving

Huiwen Yan, Mushuang Liu

TL;DR

The paper tackles the difficulty of achieving NE in general-sum Markov games by introducing Markov Potential Games (MPGs), a class with guaranteed pure NE existence and gradient-play convergence. It provides sufficient conditions on reward design and the MDP for MGs to be MPGs, and shows how a total potential $ abla abla$ can drive gradient ascent to NE. The methodology is applied to autonomous driving at intersections, where a carefully designed reward structure yields a potential function framework that enables robust, safe, and efficient multi-vehicle coordination; results indicate MARL with MPGs outperforms single-agent RL in robustness while maintaining safety across diverse surrounding policies. The work offers a practical, theoretically grounded approach to MARL for autonomous driving and suggests broader applicability to other MAS domains with similar structural properties.

Abstract

Markov games (MGs) provide a mathematical foundation for multi-agent reinforcement learning (MARL), enabling self-interested agents to learn their optimal policies while interacting with others in a shared environment. However, due to the complexities of an MG problem, seeking (Markov perfect) Nash equilibrium (NE) is often very challenging for a general-sum MG. Markov potential games (MPGs), which are a special class of MGs, have appealing properties such as guaranteed existence of pure NEs and guaranteed convergence of gradient play algorithms, thereby leading to desirable properties for many MARL algorithms in their NE-seeking processes. However, the question of how to construct MPGs has remained open. This paper provides sufficient conditions on the reward design and on the Markov decision process (MDP), under which an MG is an MPG. Numerical results on autonomous driving applications are reported.

Markov Potential Game Construction and Multi-Agent Reinforcement Learning with Applications to Autonomous Driving

TL;DR

The paper tackles the difficulty of achieving NE in general-sum Markov games by introducing Markov Potential Games (MPGs), a class with guaranteed pure NE existence and gradient-play convergence. It provides sufficient conditions on reward design and the MDP for MGs to be MPGs, and shows how a total potential can drive gradient ascent to NE. The methodology is applied to autonomous driving at intersections, where a carefully designed reward structure yields a potential function framework that enables robust, safe, and efficient multi-vehicle coordination; results indicate MARL with MPGs outperforms single-agent RL in robustness while maintaining safety across diverse surrounding policies. The work offers a practical, theoretically grounded approach to MARL for autonomous driving and suggests broader applicability to other MAS domains with similar structural properties.

Abstract

Markov games (MGs) provide a mathematical foundation for multi-agent reinforcement learning (MARL), enabling self-interested agents to learn their optimal policies while interacting with others in a shared environment. However, due to the complexities of an MG problem, seeking (Markov perfect) Nash equilibrium (NE) is often very challenging for a general-sum MG. Markov potential games (MPGs), which are a special class of MGs, have appealing properties such as guaranteed existence of pure NEs and guaranteed convergence of gradient play algorithms, thereby leading to desirable properties for many MARL algorithms in their NE-seeking processes. However, the question of how to construct MPGs has remained open. This paper provides sufficient conditions on the reward design and on the Markov decision process (MDP), under which an MG is an MPG. Numerical results on autonomous driving applications are reported.

Paper Structure

This paper contains 14 sections, 7 theorems, 35 equations, 4 figures, 2 tables.

Key Result

Lemma 1

(Gradient dominationgradienPlayLiNa) For direct distributed parameterization eq:parametrization, the following inequality holds for any $\theta=(\theta_1,\cdots,\theta_N)\in \mathcal{X}$ and any $\theta_i'\in\mathcal{X}_i,i\in\mathcal{N}$: where$\|\frac{d_{\theta'}}{d_\theta}\|_{\infty}\coloneqq\max_s\frac{d_{\theta'}(s)}{d_{\theta}(s)}$, and $\theta'=(\theta_i',\theta_{-i})$.

Figures (4)

  • Figure 1: The four-vehicle intersection scenario.
  • Figure 2: NN architecture.
  • Figure 3: Vehicles' performance in Scenario 1. (a): The ego vehicle decelerates and waits to avoid collisions; (b): The ego vehicle drives around the desired velocity after crossing; (c): The velocity histories of all vehicles.
  • Figure 4: Vehicles' performance in Scenario 2. (a): The ego vehicle decelerates to yield to the surrounding vehicles; (b): The ego vehicle drives around the desired velocity after crossing; (c): The velocity histories of all vehicles.

Theorems & Definitions (14)

  • Definition 1
  • Definition 2
  • Lemma 1
  • Theorem 1
  • proof
  • Definition 3
  • Proposition 1
  • Theorem 2
  • Theorem 3
  • proof
  • ...and 4 more