Decentralized Multi-Agent Reinforcement Learning for Continuous-Space Stochastic Games

Awni Altabaa; Bora Yongacoglu; Serdar Yüksel

Decentralized Multi-Agent Reinforcement Learning for Continuous-Space Stochastic Games

Awni Altabaa, Bora Yongacoglu, Serdar Yüksel

TL;DR

The paper addresses decentralized MARL in stochastic games with general state spaces by extending decentralized Q-learning to continuous spaces through state-space quantization and a two-time-scale learning scheme. It proves that, under weak continuity and bounded costs, agents achieve near-optimal policy updates with respect to their observed environments, and it characterizes global policy-updating dynamics as an absorbing Markov chain with a closed-form expression for equilibrium convergence probabilities. By analyzing both idealized updating dynamics and their quantized approximations, the work provides conditions under which self-play converges to (near-)equilibria and discusses limitations in achieving global team-optimality. A simulation study on a two-player stochastic team corroborates the theory, illustrating convergence behavior and the impact of quantization and exploration on attaining team-optimal policies.

Abstract

Stochastic games are a popular framework for studying multi-agent reinforcement learning (MARL). Recent advances in MARL have focused primarily on games with finitely many states. In this work, we study multi-agent learning in stochastic games with general state spaces and an information structure in which agents do not observe each other's actions. In this context, we propose a decentralized MARL algorithm and we prove the near-optimality of its policy updates. Furthermore, we study the global policy-updating dynamics for a general class of best-reply based algorithms and derive a closed-form characterization of convergence probabilities over the joint policy space.

Decentralized Multi-Agent Reinforcement Learning for Continuous-Space Stochastic Games

TL;DR

Abstract

Paper Structure (14 sections, 5 theorems, 21 equations, 1 figure, 2 algorithms)

This paper contains 14 sections, 5 theorems, 21 equations, 1 figure, 2 algorithms.

Introduction
Related Work
Background on Stochastic Games
Extending the Decentralized Q-Learning Algorithm to the Continuous-Spac Setting
Quantization of State and Action Spaces
Single-Agent Quantized Q-Learning
Decentralized Multi-Agent Q-Learning Algorithm for the Continuous-Space Setting
Policy-Updating Dynamics and Convergence to Equilibrium
Global Policy-Updating Dynamics Modeled as a Markov Chain
Convergence to Equilibrium: Closed-form Probabilistic Characterization
Convergence to Equilibrium
Games with Convergence Guarantees
Simulation Study
Conclusion

Key Result

Theorem 4.1

Suppose all players use Algorithm alg:cts_dec_qlearning to select their actions. For any $\epsilon > 0$, there exists $\tilde{T}$ such that $T_k \geq \tilde{T}$ implies where $\boldsymbol{\pi}_k$ is the baseline joint policy during the $k^{\rm th}$ exploration phase and $\boldsymbol{\pi}_{k, \rho}$ is the perturbation of $\boldsymbol{\pi}_k$ that is used for action selection. Furthermore, for any

Figures (1)

Figure 1: Simulation results: proportion of 50 trials where the policy at the $k$th exploration phase was optimal

Theorems & Definitions (15)

Definition 3.1
Definition 3.2
Theorem 4.1
proof
Remark
Proposition 5.1
proof
Proposition 5.2
proof
Remark
...and 5 more

Decentralized Multi-Agent Reinforcement Learning for Continuous-Space Stochastic Games

TL;DR

Abstract

Decentralized Multi-Agent Reinforcement Learning for Continuous-Space Stochastic Games

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (15)