Table of Contents
Fetching ...

Grouping Nodes With Known Value Differences: A Lossless UCT-based Abstraction Algorithm

Robin Schmöcker, Alexander Dockhorn, Bodo Rosenhahn

TL;DR

The paper addresses the inefficiency of Monte Carlo Tree Search by introducing Known Value Differences Abstractions (KVDA), a framework that expands Abstractions of State-Action Pairs (ASAP) to group states and actions when their value differences are inferable. KVDA-UCT integrates KVDA abstractions into the UCT framework, using difference functions that converge to true Q* and V* differences and computing aggregates via difference-accounted values. In deterministic environments KVDA-UCT yields more abstractions and often outperforms OGA-UCT and parameter-optimized baselines, while in stochastic settings its advantages are less clear, highlighting areas for further refinement. Overall, KVDA-UCT improves MCTS sample efficiency by leveraging known value gaps, offering a parameter-free alternative that preserves losslessness in the abstracted decisions.

Abstract

A core challenge of Monte Carlo Tree Search (MCTS) is its sample efficiency, which can be improved by grouping state-action pairs and using their aggregate statistics instead of single-node statistics. On the Go Abstractions in Upper Confidence bounds applied to Trees (OGA-UCT) is the state-of-the-art MCTS abstraction algorithm for deterministic environments that builds its abstraction using the Abstractions of State-Action Pairs (ASAP) framework, which aims to detect states and state-action pairs with the same value under optimal play by analysing the search graph. ASAP, however, requires two state-action pairs to have the same immediate reward, which is a rigid condition that limits the number of abstractions that can be found and thereby the sample efficiency. In this paper, we break with the paradigm of grouping value-equivalent states or state-action pairs and instead group states and state-action pairs with possibly different values as long as the difference between their values can be inferred. We call this abstraction framework Known Value Difference Abstractions (KVDA), which infers the value differences by analysis of the immediate rewards and modifies OGA-UCT to use this framework instead. The modification is called KVDA-UCT, which detects significantly more abstractions than OGA-UCT, introduces no additional parameter, and outperforms OGA-UCT on a variety of deterministic environments and parameter settings.

Grouping Nodes With Known Value Differences: A Lossless UCT-based Abstraction Algorithm

TL;DR

The paper addresses the inefficiency of Monte Carlo Tree Search by introducing Known Value Differences Abstractions (KVDA), a framework that expands Abstractions of State-Action Pairs (ASAP) to group states and actions when their value differences are inferable. KVDA-UCT integrates KVDA abstractions into the UCT framework, using difference functions that converge to true Q* and V* differences and computing aggregates via difference-accounted values. In deterministic environments KVDA-UCT yields more abstractions and often outperforms OGA-UCT and parameter-optimized baselines, while in stochastic settings its advantages are less clear, highlighting areas for further refinement. Overall, KVDA-UCT improves MCTS sample efficiency by leveraging known value gaps, offering a parameter-free alternative that preserves losslessness in the abstracted decisions.

Abstract

A core challenge of Monte Carlo Tree Search (MCTS) is its sample efficiency, which can be improved by grouping state-action pairs and using their aggregate statistics instead of single-node statistics. On the Go Abstractions in Upper Confidence bounds applied to Trees (OGA-UCT) is the state-of-the-art MCTS abstraction algorithm for deterministic environments that builds its abstraction using the Abstractions of State-Action Pairs (ASAP) framework, which aims to detect states and state-action pairs with the same value under optimal play by analysing the search graph. ASAP, however, requires two state-action pairs to have the same immediate reward, which is a rigid condition that limits the number of abstractions that can be found and thereby the sample efficiency. In this paper, we break with the paradigm of grouping value-equivalent states or state-action pairs and instead group states and state-action pairs with possibly different values as long as the difference between their values can be inferred. We call this abstraction framework Known Value Difference Abstractions (KVDA), which infers the value differences by analysis of the immediate rewards and modifies OGA-UCT to use this framework instead. The modification is called KVDA-UCT, which detects significantly more abstractions than OGA-UCT, introduces no additional parameter, and outperforms OGA-UCT on a variety of deterministic environments and parameter settings.

Paper Structure

This paper contains 18 sections, 9 equations, 27 figures, 9 tables.

Figures (27)

  • Figure 1: An example of an MDP state-transition graph where the state-of-the-art abstraction framework ASAP AnandGMS15 would detect no abstractions while our method Known-Value-Difference-Abstractions (KVDA) detects three non-trivial abstractions. In this example, circles represent states, arrows represent deterministic state-transitions and arrow annotations denote the immediate transition reward. All actions or states that are intersected by a red ellipse will be abstracted by KVDA.
  • Figure 3: The normalized pairings score $(\uparrow)$ for the top 6 and the worst agent on deterministic environments. The agents considered were KVDA-UCT (our method) which performs best overall, OGA-UCT OGAUCT, and $(\varepsilon_{\text{a}},0)$-OGA ogacad, $\varepsilon_{\text{a}} > 0$ with the exploration constants $C \in \{0.5,1,2,4,8,16\}$ and budgets of $\{100,200,500,1000\}$ iterations. The top two spots are occupied by our method KVDA-UCT, with the best overall performing algorithm being KVDA-UCT with $C=2$.
  • Figure : (a) d-Manufacturer
  • Figure : (a) Elevators
  • Figure : (a) d-Manufacturer
  • ...and 22 more figures