Table of Contents
Fetching ...

Discovering State Equivalences in UCT Search Trees By Action Pruning

Robin Schmöcker, Alexander Dockhorn, Bodo Rosenhahn

TL;DR

The work targets improving MCTS sample efficiency by discovering state equivalences through abstractions. It introduces IPA-UCT, an IPA-based modification of OGA-UCT, along with the p-ASAP/ASASAP hierarchy that unifies state- and state-action-pair abstractions. Theoretical analysis shows limitations of ASAP in finding abstractions, while IPA provides an ideal pruning mechanism to reveal more state equivalences; empirically IPA-UCT yields consistent gains with modest runtime overhead across diverse domains. The findings advance practical search efficiency by enabling more aggressive and principled state abstractions, with future work aimed at automatic parameter tuning and broader abstraction paradigms beyond action-driven pruning.

Abstract

One approach to enhance Monte Carlo Tree Search (MCTS) is to improve its sample efficiency by grouping/abstracting states or state-action pairs and sharing statistics within a group. Though state-action pair abstractions are mostly easy to find in algorithms such as On the Go Abstractions in Upper Confidence bounds applied to Trees (OGA-UCT), nearly no state abstractions are found in either noisy or large action space settings due to constraining conditions. We provide theoretical and empirical evidence for this claim, and we slightly alleviate this state abstraction problem by proposing a weaker state abstraction condition that trades a minor loss in accuracy for finding many more abstractions. We name this technique Ideal Pruning Abstractions in UCT (IPA-UCT), which outperforms OGA-UCT (and any of its derivatives) across a large range of test domains and iteration budgets as experimentally validated. IPA-UCT uses a different abstraction framework from Abstraction of State-Action Pairs (ASAP) which is the one used by OGA-UCT, which we name IPA. Furthermore, we show that both IPA and ASAP are special cases of a more general framework that we call p-ASAP which itself is a special case of the ASASAP framework.

Discovering State Equivalences in UCT Search Trees By Action Pruning

TL;DR

The work targets improving MCTS sample efficiency by discovering state equivalences through abstractions. It introduces IPA-UCT, an IPA-based modification of OGA-UCT, along with the p-ASAP/ASASAP hierarchy that unifies state- and state-action-pair abstractions. Theoretical analysis shows limitations of ASAP in finding abstractions, while IPA provides an ideal pruning mechanism to reveal more state equivalences; empirically IPA-UCT yields consistent gains with modest runtime overhead across diverse domains. The findings advance practical search efficiency by enabling more aggressive and principled state abstractions, with future work aimed at automatic parameter tuning and broader abstraction paradigms beyond action-driven pruning.

Abstract

One approach to enhance Monte Carlo Tree Search (MCTS) is to improve its sample efficiency by grouping/abstracting states or state-action pairs and sharing statistics within a group. Though state-action pair abstractions are mostly easy to find in algorithms such as On the Go Abstractions in Upper Confidence bounds applied to Trees (OGA-UCT), nearly no state abstractions are found in either noisy or large action space settings due to constraining conditions. We provide theoretical and empirical evidence for this claim, and we slightly alleviate this state abstraction problem by proposing a weaker state abstraction condition that trades a minor loss in accuracy for finding many more abstractions. We name this technique Ideal Pruning Abstractions in UCT (IPA-UCT), which outperforms OGA-UCT (and any of its derivatives) across a large range of test domains and iteration budgets as experimentally validated. IPA-UCT uses a different abstraction framework from Abstraction of State-Action Pairs (ASAP) which is the one used by OGA-UCT, which we name IPA. Furthermore, we show that both IPA and ASAP are special cases of a more general framework that we call p-ASAP which itself is a special case of the ASASAP framework.

Paper Structure

This paper contains 20 sections, 13 equations, 59 figures, 5 tables, 1 algorithm.

Figures (59)

  • Figure 1: The hierarchy of abstraction frameworks proposed by us that related Ideal Pruning Abstractions (IPA), Abstraction of State-Action Pairs (ASAP), p(runed)-ASAP, and Alternating State And State-Action Pair Abstractions (ASASAP). The leftmost diagram shows an IPA abstraction on an MDP with 5 states, which are black circles that are connected by deterministic actions that are illustrated with black arrows. The red circles show which actions and states IPA groups/abstracts. The same MDP is shown on the right, but with ASAP abstractions that do not manage to detect the equivalence of the two uppermost states.
  • Figure 2: A 5$\times$4 Navigation instance to illustrate an example where the IPA framework (our method) detects the value equivalencies of states 2,4 and 7,9 and 12,14 which cannot be done with ASAP. The circle indicates the initial position, G indicates the goal cell, white cells have a reset probability of $0$, and black cells have a reset probability of $0.5$.
  • Figure 3: The pairings scores of the best IPA-UCT parameter combination compared to the best parameter combination of RSTATE-OGA, and both pruned OGA and $(\varepsilon_{\text{a}},\varepsilon_{\text{t}})$-OGA (summarized as OGA). The overall generalization performance of IPA-UCT for all iteration settings was achieved using pruned OGA with $\lambda_{\text{p}} = 2$, $C=2$, and $\alpha=0.75$.
  • Figure 4: The relative improvement scores of the best IPA-UCT parameter combination compared to the best parameter combination of RSTATE-OGA, and both pruned OGA and $(\varepsilon_{\text{a}},\varepsilon_{\text{t}})$-OGA (summarized as OGA). The overall generalization performance of IPA-UCT for all iteration settings was achieved using $(0,0.2)$-OGA with $\lambda_{\text{p}} = 1$ and $C=1$.
  • Figure : Pairings scores
  • ...and 54 more figures