A Scalable Game Theoretic Approach for Coordination of Multiple Dynamic Systems
Mostafa M. Shibl, Vijay Gupta
TL;DR
This work considers the case where the dynamics of the coupled system can be modeled as a Markov potential game, and shows that by limiting information flow to local neighborhoods, agents’ policies can still converge to near-optimal policies.
Abstract
Learning in games provides a powerful framework to design control policies for self-interested agents that may be coupled through their dynamics, costs, or constraints. We consider the case where the dynamics of the coupled system can be modeled as a Markov potential game. In this case, distributed learning by the agents ensures that their control policies converge to a Nash equilibrium of this game. However, typical learning algorithms such as natural policy gradient require knowledge of the entire global state and actions of all the other agents, and may not be scalable as the number of agents grows. We show that by limiting the information flow to a local neighborhood of agents in the natural policy gradient algorithm, we can converge to a neighborhood of optimal policies. If the game can be designed through decomposing a global cost function of interest to a designer into local costs for the agents such that their policies at equilibrium optimize the global cost, this approach can be of interest to team coordination problems as well. We illustrate our approach through a sensor coverage problem.
