Table of Contents
Fetching ...

A Scalable Game Theoretic Approach for Coordination of Multiple Dynamic Systems

Mostafa M. Shibl, Vijay Gupta

TL;DR

This work considers the case where the dynamics of the coupled system can be modeled as a Markov potential game, and shows that by limiting information flow to local neighborhoods, agents’ policies can still converge to near-optimal policies.

Abstract

Learning in games provides a powerful framework to design control policies for self-interested agents that may be coupled through their dynamics, costs, or constraints. We consider the case where the dynamics of the coupled system can be modeled as a Markov potential game. In this case, distributed learning by the agents ensures that their control policies converge to a Nash equilibrium of this game. However, typical learning algorithms such as natural policy gradient require knowledge of the entire global state and actions of all the other agents, and may not be scalable as the number of agents grows. We show that by limiting the information flow to a local neighborhood of agents in the natural policy gradient algorithm, we can converge to a neighborhood of optimal policies. If the game can be designed through decomposing a global cost function of interest to a designer into local costs for the agents such that their policies at equilibrium optimize the global cost, this approach can be of interest to team coordination problems as well. We illustrate our approach through a sensor coverage problem.

A Scalable Game Theoretic Approach for Coordination of Multiple Dynamic Systems

TL;DR

This work considers the case where the dynamics of the coupled system can be modeled as a Markov potential game, and shows that by limiting information flow to local neighborhoods, agents’ policies can still converge to near-optimal policies.

Abstract

Learning in games provides a powerful framework to design control policies for self-interested agents that may be coupled through their dynamics, costs, or constraints. We consider the case where the dynamics of the coupled system can be modeled as a Markov potential game. In this case, distributed learning by the agents ensures that their control policies converge to a Nash equilibrium of this game. However, typical learning algorithms such as natural policy gradient require knowledge of the entire global state and actions of all the other agents, and may not be scalable as the number of agents grows. We show that by limiting the information flow to a local neighborhood of agents in the natural policy gradient algorithm, we can converge to a neighborhood of optimal policies. If the game can be designed through decomposing a global cost function of interest to a designer into local costs for the agents such that their policies at equilibrium optimize the global cost, this approach can be of interest to team coordination problems as well. We illustrate our approach through a sensor coverage problem.
Paper Structure (10 sections, 8 theorems, 21 equations, 6 figures)

This paper contains 10 sections, 8 theorems, 21 equations, 6 figures.

Key Result

Theorem 1

Consider an MPG in which all agents update their policies according to independent natural policy gradient algorithm. For a sufficiently small step size $\eta$, independent natural policy gradient exhibits last-iterate (asymptotic) convergence to the optimal Nash equilibrium policy.

Figures (6)

  • Figure 1: Job Balancing Game Network Diagram
  • Figure 2: Convergence Results of Independent Natural Policy Gradient for Job Balancing Game Problem
  • Figure 3: Percentage relative error of $\epsilon$ based on $\kappa$ for Job Balancing Game Problem
  • Figure 4: Sensor Coverage Node Diagram
  • Figure 5: Convergence Results of Independent Natural Policy Gradient for Sensor Coverage Problem
  • ...and 1 more figures

Theorems & Definitions (15)

  • Definition 1
  • Definition 2: Equilibrium and $\epsilon$-Equilibrium Joint Policies
  • Theorem 1: Theorem 1.1 in rf
  • Theorem 2
  • Lemma 3
  • proof
  • Lemma 4
  • proof
  • Lemma 5
  • proof
  • ...and 5 more