Turn-based Multi-Agent Reinforcement Learning Model Checking

Dennis Gross

Turn-based Multi-Agent Reinforcement Learning Model Checking

Dennis Gross

TL;DR

This paper addresses the challenge of verifying turn-based multi-agent reinforcement learning (TMARL) agents in stochastic multiplayer games, where traditional verification approaches struggle with multi-agent scalability. It introduces a tight integration of TMARL with probabilistic model checking by modeling the system as an $MDP$ and deriving a deterministic $DTMC$ via a joint policy wrapper to support $PCTL$ verification. The approach demonstrates improved scalability over naive monolithic model checking across diverse benchmarks and provides actionable insights into agent behavior and strategic moves. This work advances reliable validation of TMARL in complex environments, with practical implications for safer and better-designed turn-based game AI.

Abstract

In this paper, we propose a novel approach for verifying the compliance of turn-based multi-agent reinforcement learning (TMARL) agents with complex requirements in stochastic multiplayer games. Our method overcomes the limitations of existing verification approaches, which are inadequate for dealing with TMARL agents and not scalable to large games with multiple agents. Our approach relies on tight integration of TMARL and a verification technique referred to as model checking. We demonstrate the effectiveness and scalability of our technique through experiments in different types of environments. Our experiments show that our method is suited to verify TMARL agents and scales better than naive monolithic model checking.

Turn-based Multi-Agent Reinforcement Learning Model Checking

TL;DR

and deriving a deterministic

via a joint policy wrapper to support

verification. The approach demonstrates improved scalability over naive monolithic model checking across diverse benchmarks and provides actionable insights into agent behavior and strategic moves. This work advances reliable validation of TMARL in complex environments, with practical implications for safer and better-designed turn-based game AI.

Abstract

Paper Structure (18 sections, 4 equations, 5 figures, 1 table)

This paper contains 18 sections, 4 equations, 5 figures, 1 table.

INTRODUCTION
RELATED WORK
BACKGROUND
Probabilistic Systems
Turn-based Multi-Agent Reinforcement Learning (TMARL)
Model Checking of TMARL agents
Limitations.
EXPERIMENTS
Setup
Environments.
Trained TMARL agents.
Properties.
Technical setup.
Analysis
Does our proposed method scale better than naive monolithic model checking?
...and 3 more sections

Figures (5)

Figure 1: This diagram represents a single RL system in which an agent (Agent 1) interacts with an environment. The agent observes a state (denoted as $s$) and a reward (denoted as $r$) from the environment based on its previous action (denoted as $a$). The agent then uses this information to select the next action, which it sends back to the environment.
Figure 2: This diagram represents a TMARL system in which two agents (Agent 1 and Agent 2) interact in a turn-based manner with a shared environment. The agents receive states (denoted as $s_1$ and $s_2$) and rewards (denoted as $r_1$ and $r_2$) from the environment based on their previous actions (denoted as $a_1$ and $a_2$). The agents then use this information to select their next actions, which they send back to the environment.
Figure 3: An example of a joint policy wrapper with two policies. The wrapper takes in a state (denoted as $s$) and extracts the current turn from that state. It then uses this information to determine which of two policies ($\pi_1$ and $\pi_2$) should choose the next action. The selected policy then produces an action, which is output by the joint policy wrapper.
Figure 4: This screenshot shows a scene from the Showdown AI competition, in which two Pokemon characters are engaged in a battle. We model this scene in PRISM. The AI-controlled Pokemons use different policies $\pi_i$ to try and defeat their opponent. The outcome of the battle will depend on the abilities and actions of the two Pokemons, as well as on random elements.
Figure 5: The diagram shows the time it takes to build a state for a TMARL system as the number of agents in the system increases. The curve in the diagram indicates that the time it takes to build a state increases exponentially as the number of agents increases.

Theorems & Definitions (4)

Definition 3.1: Markov Decision Process
Definition 3.2: Policy
Definition 3.3: PCTL Syntax
Definition 3.4: Turn-based stochastic multi-player game

Turn-based Multi-Agent Reinforcement Learning Model Checking

TL;DR

Abstract

Turn-based Multi-Agent Reinforcement Learning Model Checking

Authors

TL;DR

Abstract

Table of Contents

Figures (5)

Theorems & Definitions (4)