Generalized Nested Rollout Policy Adaptation with Limited Repetitions

Tristan Cazenave

Generalized Nested Rollout Policy Adaptation with Limited Repetitions

Tristan Cazenave

TL;DR

The paper addresses over-determinism in generalized Nested Rollout Policy Adaptation (GNRPA) by introducing GNRPALR, which limits repetitions of the best sequence at each level to maintain exploration. It demonstrates that this simple repetition-threshold mechanism yields substantial speedups (up to about eightfold) on three challenging combinatorial problems—Inverse RNA Folding (Eterna100), the Traveling Salesman Problem with Time Windows (TSPTW), and Weak Schur Numbers—especially as thinking time increases. The approach preserves the bias-based playouts of GNRPA while enabling longer, more diverse searches, with practical implications for complex decision problems. The work also suggests future directions such as per-level tuning of the repetition parameter $R$ and links to Stabilized NRPA for further exploration balance.

Abstract

Generalized Nested Rollout Policy Adaptation (GNRPA) is a Monte Carlo search algorithm for optimizing a sequence of choices. We propose to improve on GNRPA by avoiding too deterministic policies that find again and again the same sequence of choices. We do so by limiting the number of repetitions of the best sequence found at a given level. Experiments show that it improves the algorithm for three different combinatorial problems: Inverse RNA Folding, the Traveling Salesman Problem with Time Windows and the Weak Schur problem.

Generalized Nested Rollout Policy Adaptation with Limited Repetitions

TL;DR

and links to Stabilized NRPA for further exploration balance.

Abstract

Paper Structure (9 sections, 8 equations, 4 figures, 1 table, 4 algorithms)

This paper contains 9 sections, 8 equations, 4 figures, 1 table, 4 algorithms.

Introduction
Monte Carlo Search
GNRPA
GNRPALR
Experimental Results
The Inverse RNA Folding Problem
The Traveling Salesman Problem with Time Windows
The Weak Schur Problem
Conclusion

Figures (4)

Figure 1: Gladius problem 90 from Eterna100. The associated target structure is: (....)..(....(...(..(.(..(...(((.(((...((((....)))).(((((..(.(((..(.((((..(.((((..(.((((((((. ((((((.(((((.((((.((((.((((...)))).))).)))))).))))).)))))).)))))))..).))))..).))))..).)) )..).))))).)))...(((.(((((.(..(((.(..((((.(..((((.(..(((((((.((((((.(((((.((((((.(((.(((( ((....))))))..)))).)))).))))).)))))).)))))))).)..)))).)..)))).)..))).)..))))).((((....)))) ...))).)))...)..).)..)...)....)..(....)
Figure 2: Comparison of GNRPA and GNRPALR for Inverse RNA Folding. The number of repetitions is set to 0 for GNRPALR. GNRPALR is eight times faster than GNRPA. It solves 88 problems in 4,096 seconds when GNRPA solves 82. The relative performance of GNRPALR improves with more search time. The tests are made using the 100 problems of Eterna100.
Figure 3: Comparison of GNRPA and GNRPALR for the TSPTW. The number of repetitions is set to 5 for GNRPALR. GNRPALR is much better than GNRPA for this problem. As we can see in the figure, the average score obtained with GNRPA after 10,000 seconds is obtained approximately 8 times slower than with GNRPALR. The averages are calculated over 100 runs of each algorithm with seeds ranging from 1 to 100. The problem solved is rc204.1, the most difficult problem from Solomon test suite for the TSPTW.
Figure 4: Comparison of GNRPA and GNRPALR for the Weak Schur problem of dimension 8. The number of repetitions is set to 0 for GNRPALR. The averages are calculated over 100 runs of each algorithm with seeds ranging from 1 to 100. The improvement of GNRPALR over GNRPA is greater for long search times. The average score obtained with GNRPA after 10,000 seconds is obtained approximately 8 times slower than with GNRPALR.

Generalized Nested Rollout Policy Adaptation with Limited Repetitions

TL;DR

Abstract

Generalized Nested Rollout Policy Adaptation with Limited Repetitions

Authors

TL;DR

Abstract

Table of Contents

Figures (4)