Table of Contents
Fetching ...

Generalized Rapid Action Value Estimation in Memory-Constrained Environments

Aloïs Rautureau, Tristan Cazenave, Éric Piette

TL;DR

The GRAVE2, GRAVER and GRAVER2 algorithms are introduced, which extend GRAVE through two-level search, node recycling, and a combination of both techniques, respectively, and enable a drastic reduction in the number of stored nodes while matching the playing strength of GRAVE.

Abstract

Generalized Rapid Action Value Estimation (GRAVE) has been shown to be a strong variant within the Monte-Carlo Tree Search (MCTS) family of algorithms for General Game Playing (GGP). However, its reliance on storing additional win/visit statistics at each node makes its use impractical in memory-constrained environments, thereby limiting its applicability in practice. In this paper, we introduce the GRAVE2, GRAVER and GRAVER2 algorithms, which extend GRAVE through two-level search, node recycling, and a combination of both techniques, respectively. We show that these enhancements enable a drastic reduction in the number of stored nodes while matching the playing strength of GRAVE.

Generalized Rapid Action Value Estimation in Memory-Constrained Environments

TL;DR

The GRAVE2, GRAVER and GRAVER2 algorithms are introduced, which extend GRAVE through two-level search, node recycling, and a combination of both techniques, respectively, and enable a drastic reduction in the number of stored nodes while matching the playing strength of GRAVE.

Abstract

Generalized Rapid Action Value Estimation (GRAVE) has been shown to be a strong variant within the Monte-Carlo Tree Search (MCTS) family of algorithms for General Game Playing (GGP). However, its reliance on storing additional win/visit statistics at each node makes its use impractical in memory-constrained environments, thereby limiting its applicability in practice. In this paper, we introduce the GRAVE2, GRAVER and GRAVER2 algorithms, which extend GRAVE through two-level search, node recycling, and a combination of both techniques, respectively. We show that these enhancements enable a drastic reduction in the number of stored nodes while matching the playing strength of GRAVE.
Paper Structure (12 sections, 3 equations, 6 figures, 1 table)

This paper contains 12 sections, 3 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: Relationships between GRAVE, GRAVE$^2$, GRAVER and GRAVER$^2$. $N$ indicates the total number of nodes stored, $P$ the total number of playouts performed, and $N_{sec}$ the nodes stored in the second-level tree.
  • Figure 2: Forward node sharing in GRAVE$^2$. The selection path in the top-level tree is fixed while the second-level search is running, and the latter may use AMAF values aggregated in the top-level tree to guide its exploration as long as the second-level root has fewer visits than the parameterized reference threshold. Values obtained from playouts in the second-level tree are backpropagated to the top-level tree after each iteration, rather than in a single batch once the second-level search terminates. If a child of the currently referenced node exceeds the reference threshold, it becomes the new referenced node.
  • Figure 3: Winrates of GRAVE$^2$ with and without forward sharing and UCT² against GRAVE with $P = N = 10,000$. The dotted lines indicate the 95% confidence interval of the winrate of GRAVE against itself, representing the region in which compared algorithms can be considered to have playing strength equal to GRAVE.
  • Figure 4: Winrates of GRAVER and UCT with node recycling against GRAVE with $P = 10,000$. The node pool size used is presented on a logarithmic scale. The red dashed line indicates the threshold $N = 10,000$, beyond which all expanded nodes can be stored and node recycling no longer takes effect.
  • Figure 5: Winrate of GRAVER$^2$ against GRAVE ($P = 10,000$), varying the ratio of playouts to stored nodes in the top-level tree (Left) and second-level tree (Right).
  • ...and 1 more figures