Learning a Prior for Monte Carlo Search by Replaying Solutions to Combinatorial Problems

Tristan Cazenave

Learning a Prior for Monte Carlo Search by Replaying Solutions to Combinatorial Problems

Tristan Cazenave

TL;DR

The paper introduces a replay-based method to automatically learn priors for Monte Carlo Search by counting how often moves participate in solved solutions. The learned bias $\beta_m = \tau \log( \frac{count[code(m)]}{nb[code(m)]} )$ biases playouts without adding runtime cost, and is integrated into a GNRPA framework that generalizes NRPA. Across Latin Square Completion, Kakuro, and Inverse RNA Folding, the learned priors substantially improve performance over uniform playouts and over manually crafted priors, with domain-specific priors (e.g., NGRAM for RNA from the Rfam database) delivering the best results. The approach is simple, general, and scalable, suggesting broad applicability to other difficult combinatorial problems and potential extensions with richer move-codes and problem classes.

Abstract

Monte Carlo Search gives excellent results in multiple difficult combinatorial problems. Using a prior to perform non uniform playouts during the search improves a lot the results compared to uniform playouts. Handmade heuristics tailored to the combinatorial problem are often used as priors. We propose a method to automatically compute a prior. It uses statistics on solved problems. It is a simple and general method that incurs no computational cost at playout time and that brings large performance gains. The method is applied to three difficult combinatorial problems: Latin Square Completion, Kakuro, and Inverse RNA Folding.

Learning a Prior for Monte Carlo Search by Replaying Solutions to Combinatorial Problems

TL;DR

The paper introduces a replay-based method to automatically learn priors for Monte Carlo Search by counting how often moves participate in solved solutions. The learned bias

biases playouts without adding runtime cost, and is integrated into a GNRPA framework that generalizes NRPA. Across Latin Square Completion, Kakuro, and Inverse RNA Folding, the learned priors substantially improve performance over uniform playouts and over manually crafted priors, with domain-specific priors (e.g., NGRAM for RNA from the Rfam database) delivering the best results. The approach is simple, general, and scalable, suggesting broad applicability to other difficult combinatorial problems and potential extensions with richer move-codes and problem classes.

Abstract

Paper Structure (8 sections, 6 equations, 6 figures, 3 tables, 5 algorithms)

This paper contains 8 sections, 6 equations, 6 figures, 3 tables, 5 algorithms.

Introduction
Monte Carlo Search
Learning a Prior
Experimental Results
Latin Square Completion
Kakuro
Inverse RNA Folding
Conclusion

Figures (6)

Figure 1: The distribution of the priors for LSC. The priors associated to codes that have never been seen during replay (e.g. nb [code] = 0) have been removed.
Figure 2: The median number of random playouts required to solve LSC instances of size 20 with x% of empty cells. The phase transition happens at 42% of empty cells. All further experiments will use Latin Squares of size 20 with 42% of empty cells. The median for each percentage was calculated solving 1,000 problems.
Figure 3: The distribution of the priors for Kakuro. The y-axis gives the number of priors in each range of values. For example there are 15,410 priors that have the value 1.0 and 20,353 priors that have a value between 0.0 and 0.1. The priors associated to codes that have never been seen during replay (e.g. nb [code] = 0) have been removed. We can observe the peak at 0.0 which mainly corresponds to the numbers that are impossible given the row and the column sums. We can also observe the smaller peak at 1.0 which corresponds to the numbers that are forced. Note that apart from these two cases there are many cases where the prior is between 0.0 and 1.0 which does not correspond to a hard constraint.
Figure 4: The distribution of the priors for Inverse RNA Folding. The y-axis gives the number of priors in each range of values. There are 6 possible moves for a '(' and 4 possible moves for a '.' in the target structure. This makes 10 possibilities for the previous move in the NGRAM and again 10 possibilities for the current move. Therefore there are 100 different priors. On the contrary of LSC and Kakuro the distribution of the priors is mainly on small values. The smallest prior is equal to 0.010083 and the greatest prior is equal to 0.437825.
Figure 5: The evolution with the logarithm of the search time of the number of Eterna100 problems solved by NRPA and GNRPA NGRAM prior.
...and 1 more figures

Learning a Prior for Monte Carlo Search by Replaying Solutions to Combinatorial Problems

TL;DR

Abstract

Learning a Prior for Monte Carlo Search by Replaying Solutions to Combinatorial Problems

Authors

TL;DR

Abstract

Table of Contents

Figures (6)