Decentralized Multi-Agent Reinforcement Learning for Continuous-Space Stochastic Games
Awni Altabaa, Bora Yongacoglu, Serdar Yüksel
TL;DR
The paper addresses decentralized MARL in stochastic games with general state spaces by extending decentralized Q-learning to continuous spaces through state-space quantization and a two-time-scale learning scheme. It proves that, under weak continuity and bounded costs, agents achieve near-optimal policy updates with respect to their observed environments, and it characterizes global policy-updating dynamics as an absorbing Markov chain with a closed-form expression for equilibrium convergence probabilities. By analyzing both idealized updating dynamics and their quantized approximations, the work provides conditions under which self-play converges to (near-)equilibria and discusses limitations in achieving global team-optimality. A simulation study on a two-player stochastic team corroborates the theory, illustrating convergence behavior and the impact of quantization and exploration on attaining team-optimal policies.
Abstract
Stochastic games are a popular framework for studying multi-agent reinforcement learning (MARL). Recent advances in MARL have focused primarily on games with finitely many states. In this work, we study multi-agent learning in stochastic games with general state spaces and an information structure in which agents do not observe each other's actions. In this context, we propose a decentralized MARL algorithm and we prove the near-optimality of its policy updates. Furthermore, we study the global policy-updating dynamics for a general class of best-reply based algorithms and derive a closed-form characterization of convergence probabilities over the joint policy space.
