Table of Contents
Fetching ...

PyTAG: Tabletop Games for Multi-Agent Reinforcement Learning

Martin Balla, George E. M. Long, James Goodman, Raluca D. Gaina, Diego Perez-Liebana

TL;DR

PyTAG addresses the lack of a unified MARL benchmark for modern tabletop games by providing a versatile Python API that interfaces with the TAG framework, enabling self-play based PPO training across a diverse set of games. The work demonstrates how game-specific observation extraction and action masking can support reinforcement learning in environments with varied turn orders, hidden information, and large action spaces, evaluating progress against simple baselines and MCTS. Key contributions include the integration of eight TTGs, a self-play training loop with an opponent pool, and an analysis of challenges and opportunities in TTG MARL, with open-source code to accelerate community adoption. The findings highlight both the feasibility and the limitations of current MARL approaches in TTGs and point to future directions such as memory-augmented agents and language-model-assisted state interpretation to enhance performance and generalisation.

Abstract

Modern Tabletop Games present various interesting challenges for Multi-agent Reinforcement Learning. In this paper, we introduce PyTAG, a new framework that supports interacting with a large collection of games implemented in the Tabletop Games framework. In this work we highlight the challenges tabletop games provide, from a game-playing agent perspective, along with the opportunities they provide for future research. Additionally, we highlight the technical challenges that involve training Reinforcement Learning agents on these games. To explore the Multi-agent setting provided by PyTAG we train the popular Proximal Policy Optimisation Reinforcement Learning algorithm using self-play on a subset of games and evaluate the trained policies against some simple agents and Monte-Carlo Tree Search implemented in the Tabletop Games framework.

PyTAG: Tabletop Games for Multi-Agent Reinforcement Learning

TL;DR

PyTAG addresses the lack of a unified MARL benchmark for modern tabletop games by providing a versatile Python API that interfaces with the TAG framework, enabling self-play based PPO training across a diverse set of games. The work demonstrates how game-specific observation extraction and action masking can support reinforcement learning in environments with varied turn orders, hidden information, and large action spaces, evaluating progress against simple baselines and MCTS. Key contributions include the integration of eight TTGs, a self-play training loop with an opponent pool, and an analysis of challenges and opportunities in TTG MARL, with open-source code to accelerate community adoption. The findings highlight both the feasibility and the limitations of current MARL approaches in TTGs and point to future directions such as memory-augmented agents and language-model-assisted state interpretation to enhance performance and generalisation.

Abstract

Modern Tabletop Games present various interesting challenges for Multi-agent Reinforcement Learning. In this paper, we introduce PyTAG, a new framework that supports interacting with a large collection of games implemented in the Tabletop Games framework. In this work we highlight the challenges tabletop games provide, from a game-playing agent perspective, along with the opportunities they provide for future research. Additionally, we highlight the technical challenges that involve training Reinforcement Learning agents on these games. To explore the Multi-agent setting provided by PyTAG we train the popular Proximal Policy Optimisation Reinforcement Learning algorithm using self-play on a subset of games and evaluate the trained policies against some simple agents and Monte-Carlo Tree Search implemented in the Tabletop Games framework.
Paper Structure (23 sections, 10 figures, 2 tables)

This paper contains 23 sections, 10 figures, 2 tables.

Figures (10)

  • Figure 1: "Sushi Go!" Graphical User Interface in TAG
  • Figure 2: The overall self-play setting used in our experiments.
  • Figure 3: Some metrics to highlight the outcomes observed during self-play. All plots show the running mean of the metric from the last 100 episodes with the standard error shown with the shaded area. From left to right: Exploding Kittens win rate, Tic Tac Toe tie rate, learner's achieved scores and score differences (score of the winner minus learner's score) in Diamant.
  • Figure 4: "Tic Tac Toe" evaluation win and tie rate.
  • Figure 5: Evaluation win rate against the baseline agents in 2 and 4-player "Diamant". Solid lines show agents trained with only the Terminal reward. Dashed lines used the score as reward.
  • ...and 5 more figures