Generalized Nested Rollout Policy Adaptation with Limited Repetitions
Tristan Cazenave
TL;DR
The paper addresses over-determinism in generalized Nested Rollout Policy Adaptation (GNRPA) by introducing GNRPALR, which limits repetitions of the best sequence at each level to maintain exploration. It demonstrates that this simple repetition-threshold mechanism yields substantial speedups (up to about eightfold) on three challenging combinatorial problems—Inverse RNA Folding (Eterna100), the Traveling Salesman Problem with Time Windows (TSPTW), and Weak Schur Numbers—especially as thinking time increases. The approach preserves the bias-based playouts of GNRPA while enabling longer, more diverse searches, with practical implications for complex decision problems. The work also suggests future directions such as per-level tuning of the repetition parameter $R$ and links to Stabilized NRPA for further exploration balance.
Abstract
Generalized Nested Rollout Policy Adaptation (GNRPA) is a Monte Carlo search algorithm for optimizing a sequence of choices. We propose to improve on GNRPA by avoiding too deterministic policies that find again and again the same sequence of choices. We do so by limiting the number of repetitions of the best sequence found at a given level. Experiments show that it improves the algorithm for three different combinatorial problems: Inverse RNA Folding, the Traveling Salesman Problem with Time Windows and the Weak Schur problem.
