Efficient Monte Carlo Tree Search via On-the-Fly State-Conditioned Action Abstraction

Yunhyeok Kwak; Inwoo Hwang; Dooyoung Kim; Sanghack Lee; Byoung-Tak Zhang

Efficient Monte Carlo Tree Search via On-the-Fly State-Conditioned Action Abstraction

Yunhyeok Kwak, Inwoo Hwang, Dooyoung Kim, Sanghack Lee, Byoung-Tak Zhang

TL;DR

This work tackles the challenge of planning with very large factored action spaces in Monte Carlo Tree Search by learning state-conditioned action abstractions from high-dimensional observations. It introduces a latent dynamics model together with a conditional structure inference network that outputs a state-dependent mask $M(z)$, yielding an abstract action $\phi_z(A)$ that includes only the sub-actions relevant to the transition. The method trains the latent dynamics model using a $K$-step reconstruction loss and straight-through Gumbel-Softmax to infer CSI, and integrates the abstraction directly into MCTS by selecting and expanding on $\phi_z(a)$, while marginalizing the policy over actions that map to the same abstract action. Experiments on DoorKey and Sokoban with up to 405 action configurations show substantial gains in sample efficiency over MuZero, and qualitative analyses (GradCAM, SHD) confirm that the model captures meaningful context-specific dependencies. The approach is practical in environments lacking a known model and with pixel observations, offering a scalable path to efficient planning in complex, real-world domains where actions are naturally factored.

Abstract

Monte Carlo Tree Search (MCTS) has showcased its efficacy across a broad spectrum of decision-making problems. However, its performance often degrades under vast combinatorial action space, especially where an action is composed of multiple sub-actions. In this work, we propose an action abstraction based on the compositional structure between a state and sub-actions for improving the efficiency of MCTS under a factored action space. Our method learns a latent dynamics model with an auxiliary network that captures sub-actions relevant to the transition on the current state, which we call state-conditioned action abstraction. Notably, it infers such compositional relationships from high-dimensional observations without the known environment model. During the tree traversal, our method constructs the state-conditioned action abstraction for each node on-the-fly, reducing the search space by discarding the exploration of redundant sub-actions. Experimental results demonstrate the superior sample efficiency of our method compared to vanilla MuZero, which suffers from expansive action space.

Efficient Monte Carlo Tree Search via On-the-Fly State-Conditioned Action Abstraction

TL;DR

, yielding an abstract action

that includes only the sub-actions relevant to the transition. The method trains the latent dynamics model using a

-step reconstruction loss and straight-through Gumbel-Softmax to infer CSI, and integrates the abstraction directly into MCTS by selecting and expanding on

, while marginalizing the policy over actions that map to the same abstract action. Experiments on DoorKey and Sokoban with up to 405 action configurations show substantial gains in sample efficiency over MuZero, and qualitative analyses (GradCAM, SHD) confirm that the model captures meaningful context-specific dependencies. The approach is practical in environments lacking a known model and with pixel observations, offering a scalable path to efficient planning in complex, real-world domains where actions are naturally factored.

Abstract

Paper Structure (24 sections, 13 equations, 16 figures, 4 tables)

This paper contains 24 sections, 13 equations, 16 figures, 4 tables.

Introduction
Preliminaries
Monte Carlo Tree Search
Context-Specific Independence
Method
State-conditioned action abstraction
Training Latent Dynamics Model
Complete method: MCTS with State-Conditioned Action Abstraction
Experiments
Experimental Setup
Environments
Results
Related Work
Conclusion
Environmental details
...and 9 more sections

Figures (16)

Figure 1: Normalized score of MuZero schrittwieser2020mastering and our method in environments with a factored action space. In contrast to MuZero which suffers from the increasing number of available actions, our method with state-conditioned action abstraction remains effective.
Figure 2: Overall framework. (a) Training latent dynamics model with conditional structure inference network. (b) The proposed MCTS with state-conditioned action abstraction.
Figure 3: Sample images for each environment.
Figure 4: Comparison of aggregate metrics across all tasks.
Figure 5: Learning curves. The average episodic return is depicted by lines, and the shaded areas represent the $95\%$ confidence intervals.
...and 11 more figures

Theorems & Definitions (1)

Definition 2.1: Context-Specific Independence

Efficient Monte Carlo Tree Search via On-the-Fly State-Conditioned Action Abstraction

TL;DR

Abstract

Efficient Monte Carlo Tree Search via On-the-Fly State-Conditioned Action Abstraction

Authors

TL;DR

Abstract

Table of Contents

Figures (16)

Theorems & Definitions (1)