Generalized Rapid Action Value Estimation in Memory-Constrained Environments

Aloïs Rautureau; Tristan Cazenave; Éric Piette

Generalized Rapid Action Value Estimation in Memory-Constrained Environments

Aloïs Rautureau, Tristan Cazenave, Éric Piette

TL;DR

The GRAVE2, GRAVER and GRAVER2 algorithms are introduced, which extend GRAVE through two-level search, node recycling, and a combination of both techniques, respectively, and enable a drastic reduction in the number of stored nodes while matching the playing strength of GRAVE.

Abstract

Generalized Rapid Action Value Estimation (GRAVE) has been shown to be a strong variant within the Monte-Carlo Tree Search (MCTS) family of algorithms for General Game Playing (GGP). However, its reliance on storing additional win/visit statistics at each node makes its use impractical in memory-constrained environments, thereby limiting its applicability in practice. In this paper, we introduce the GRAVE2, GRAVER and GRAVER2 algorithms, which extend GRAVE through two-level search, node recycling, and a combination of both techniques, respectively. We show that these enhancements enable a drastic reduction in the number of stored nodes while matching the playing strength of GRAVE.

Generalized Rapid Action Value Estimation in Memory-Constrained Environments

TL;DR

Abstract

Paper Structure (12 sections, 3 equations, 6 figures, 1 table)

This paper contains 12 sections, 3 equations, 6 figures, 1 table.

Introduction
Related work
Methods
GRAVE$^2$
GRAVER
GRAVER$^2$
Experimental results
Two-level search
Node recycling
Node recycling in two-level search
Conclusion
Future work

Figures (6)

Figure 1: Relationships between GRAVE, GRAVE$^2$, GRAVER and GRAVER$^2$. $N$ indicates the total number of nodes stored, $P$ the total number of playouts performed, and $N_{sec}$ the nodes stored in the second-level tree.
Figure 2: Forward node sharing in GRAVE$^2$. The selection path in the top-level tree is fixed while the second-level search is running, and the latter may use AMAF values aggregated in the top-level tree to guide its exploration as long as the second-level root has fewer visits than the parameterized reference threshold. Values obtained from playouts in the second-level tree are backpropagated to the top-level tree after each iteration, rather than in a single batch once the second-level search terminates. If a child of the currently referenced node exceeds the reference threshold, it becomes the new referenced node.
Figure 3: Winrates of GRAVE$^2$ with and without forward sharing and UCT² against GRAVE with $P = N = 10,000$. The dotted lines indicate the 95% confidence interval of the winrate of GRAVE against itself, representing the region in which compared algorithms can be considered to have playing strength equal to GRAVE.
Figure 4: Winrates of GRAVER and UCT with node recycling against GRAVE with $P = 10,000$. The node pool size used is presented on a logarithmic scale. The red dashed line indicates the threshold $N = 10,000$, beyond which all expanded nodes can be stored and node recycling no longer takes effect.
Figure 5: Winrate of GRAVER$^2$ against GRAVE ($P = 10,000$), varying the ratio of playouts to stored nodes in the top-level tree (Left) and second-level tree (Right).
...and 1 more figures

Generalized Rapid Action Value Estimation in Memory-Constrained Environments

TL;DR

Abstract

Generalized Rapid Action Value Estimation in Memory-Constrained Environments

Authors

TL;DR

Abstract

Table of Contents

Figures (6)