Learning a Prior for Monte Carlo Search by Replaying Solutions to Combinatorial Problems
Tristan Cazenave
TL;DR
The paper introduces a replay-based method to automatically learn priors for Monte Carlo Search by counting how often moves participate in solved solutions. The learned bias $\beta_m = \tau \log( \frac{count[code(m)]}{nb[code(m)]} )$ biases playouts without adding runtime cost, and is integrated into a GNRPA framework that generalizes NRPA. Across Latin Square Completion, Kakuro, and Inverse RNA Folding, the learned priors substantially improve performance over uniform playouts and over manually crafted priors, with domain-specific priors (e.g., NGRAM for RNA from the Rfam database) delivering the best results. The approach is simple, general, and scalable, suggesting broad applicability to other difficult combinatorial problems and potential extensions with richer move-codes and problem classes.
Abstract
Monte Carlo Search gives excellent results in multiple difficult combinatorial problems. Using a prior to perform non uniform playouts during the search improves a lot the results compared to uniform playouts. Handmade heuristics tailored to the combinatorial problem are often used as priors. We propose a method to automatically compute a prior. It uses statistics on solved problems. It is a simple and general method that incurs no computational cost at playout time and that brings large performance gains. The method is applied to three difficult combinatorial problems: Latin Square Completion, Kakuro, and Inverse RNA Folding.
