Discovering State Equivalences in UCT Search Trees By Action Pruning
Robin Schmöcker, Alexander Dockhorn, Bodo Rosenhahn
TL;DR
The work targets improving MCTS sample efficiency by discovering state equivalences through abstractions. It introduces IPA-UCT, an IPA-based modification of OGA-UCT, along with the p-ASAP/ASASAP hierarchy that unifies state- and state-action-pair abstractions. Theoretical analysis shows limitations of ASAP in finding abstractions, while IPA provides an ideal pruning mechanism to reveal more state equivalences; empirically IPA-UCT yields consistent gains with modest runtime overhead across diverse domains. The findings advance practical search efficiency by enabling more aggressive and principled state abstractions, with future work aimed at automatic parameter tuning and broader abstraction paradigms beyond action-driven pruning.
Abstract
One approach to enhance Monte Carlo Tree Search (MCTS) is to improve its sample efficiency by grouping/abstracting states or state-action pairs and sharing statistics within a group. Though state-action pair abstractions are mostly easy to find in algorithms such as On the Go Abstractions in Upper Confidence bounds applied to Trees (OGA-UCT), nearly no state abstractions are found in either noisy or large action space settings due to constraining conditions. We provide theoretical and empirical evidence for this claim, and we slightly alleviate this state abstraction problem by proposing a weaker state abstraction condition that trades a minor loss in accuracy for finding many more abstractions. We name this technique Ideal Pruning Abstractions in UCT (IPA-UCT), which outperforms OGA-UCT (and any of its derivatives) across a large range of test domains and iteration budgets as experimentally validated. IPA-UCT uses a different abstraction framework from Abstraction of State-Action Pairs (ASAP) which is the one used by OGA-UCT, which we name IPA. Furthermore, we show that both IPA and ASAP are special cases of a more general framework that we call p-ASAP which itself is a special case of the ASASAP framework.
