Table of Contents
Fetching ...

Towards a Characterisation of Monte-Carlo Tree Search Performance in Different Games

Dennis J. N. J. Soemers, Guillaume Bams, Max Persoon, Marco Rietjens, Dimitar Sladić, Stefan Stefanov, Kurt Driessens, Mark H. M. Winands

TL;DR

The paper addresses the need to understand which MCTS variants perform best in different games by constructing a large, labeled dataset of MCTS plays across diverse Ludii games and by representing each game with an extensive feature set. It introduces 61 agents (including a Random baseline) and 1494 games, records outcomes as utilities in the range [-1,1], and uses SHAP for interpretation of predictive models that relate agent configurations and game features to performance. Preliminary results show that simple models can capture meaningful signals (with Random Forest achieving lower RMSE/MAE than a dummy baseline) and reveal interpretable patterns, such as the predictive value of random-playout advantages and the negative impact of early-terminated playouts. The work lays a foundation for characterizing when specific MCTS variants succeed, highlights practical lessons for dataset construction and evaluation, and outlines plans for a more comprehensive next version to enable deeper, feature-interaction analyses.

Abstract

Many enhancements to Monte-Carlo Tree Search (MCTS) have been proposed over almost two decades of general game playing and other artificial intelligence research. However, our ability to characterise and understand which variants work well or poorly in which games is still lacking. This paper describes work on an initial dataset that we have built to make progress towards such an understanding: 268,386 plays among 61 different agents across 1494 distinct games. We describe a preliminary analysis and work on training predictive models on this dataset, as well as lessons learned and future plans for a new and improved version of the dataset.

Towards a Characterisation of Monte-Carlo Tree Search Performance in Different Games

TL;DR

The paper addresses the need to understand which MCTS variants perform best in different games by constructing a large, labeled dataset of MCTS plays across diverse Ludii games and by representing each game with an extensive feature set. It introduces 61 agents (including a Random baseline) and 1494 games, records outcomes as utilities in the range [-1,1], and uses SHAP for interpretation of predictive models that relate agent configurations and game features to performance. Preliminary results show that simple models can capture meaningful signals (with Random Forest achieving lower RMSE/MAE than a dummy baseline) and reveal interpretable patterns, such as the predictive value of random-playout advantages and the negative impact of early-terminated playouts. The work lays a foundation for characterizing when specific MCTS variants succeed, highlights practical lessons for dataset construction and evaluation, and outlines plans for a more comprehensive next version to enable deeper, feature-interaction analyses.

Abstract

Many enhancements to Monte-Carlo Tree Search (MCTS) have been proposed over almost two decades of general game playing and other artificial intelligence research. However, our ability to characterise and understand which variants work well or poorly in which games is still lacking. This paper describes work on an initial dataset that we have built to make progress towards such an understanding: 268,386 plays among 61 different agents across 1494 distinct games. We describe a preliminary analysis and work on training predictive models on this dataset, as well as lessons learned and future plans for a new and improved version of the dataset.
Paper Structure (9 sections, 1 figure, 2 tables)

This paper contains 9 sections, 1 figure, 2 tables.

Figures (1)

  • Figure 1: SHAP Lundberg_2017_Shap value estimates of feature importance for the five top individual contributors (plus all other features together), for a Random Forest. Red (blue) means a high (low) feature value. Right (left) means a strong positive (negative) impact on predicted utility for the first agent.