Efficient Monte Carlo Tree Search via On-the-Fly State-Conditioned Action Abstraction
Yunhyeok Kwak, Inwoo Hwang, Dooyoung Kim, Sanghack Lee, Byoung-Tak Zhang
TL;DR
This work tackles the challenge of planning with very large factored action spaces in Monte Carlo Tree Search by learning state-conditioned action abstractions from high-dimensional observations. It introduces a latent dynamics model together with a conditional structure inference network that outputs a state-dependent mask $M(z)$, yielding an abstract action $\phi_z(A)$ that includes only the sub-actions relevant to the transition. The method trains the latent dynamics model using a $K$-step reconstruction loss and straight-through Gumbel-Softmax to infer CSI, and integrates the abstraction directly into MCTS by selecting and expanding on $\phi_z(a)$, while marginalizing the policy over actions that map to the same abstract action. Experiments on DoorKey and Sokoban with up to 405 action configurations show substantial gains in sample efficiency over MuZero, and qualitative analyses (GradCAM, SHD) confirm that the model captures meaningful context-specific dependencies. The approach is practical in environments lacking a known model and with pixel observations, offering a scalable path to efficient planning in complex, real-world domains where actions are naturally factored.
Abstract
Monte Carlo Tree Search (MCTS) has showcased its efficacy across a broad spectrum of decision-making problems. However, its performance often degrades under vast combinatorial action space, especially where an action is composed of multiple sub-actions. In this work, we propose an action abstraction based on the compositional structure between a state and sub-actions for improving the efficiency of MCTS under a factored action space. Our method learns a latent dynamics model with an auxiliary network that captures sub-actions relevant to the transition on the current state, which we call state-conditioned action abstraction. Notably, it infers such compositional relationships from high-dimensional observations without the known environment model. During the tree traversal, our method constructs the state-conditioned action abstraction for each node on-the-fly, reducing the search space by discarding the exploration of redundant sub-actions. Experimental results demonstrate the superior sample efficiency of our method compared to vanilla MuZero, which suffers from expansive action space.
