UNSAT Solver Synthesis via Monte Carlo Forest Search

Chris Cameron; Jason Hartford; Taylor Lundy; Tuan Truong; Alan Milligan; Rex Chen; Kevin Leyton-Brown

UNSAT Solver Synthesis via Monte Carlo Forest Search

Chris Cameron, Jason Hartford, Taylor Lundy, Tuan Truong, Alan Milligan, Rex Chen, Kevin Leyton-Brown

TL;DR

Knuth Synthesis is the first RL approach to avoid the prohibitive costs of policy evaluations in an exponentially-sized tree, leveraging two key ideas: first, it estimates tree size by randomly sampling paths and measuring their lengths, drawing on an unbiased approximation due to Knuth (1975); second, it queries a strong solver at a user-defined depth rather than learning a policy across the whole tree, to focus the policy search on early decisions that offer the greatest potential for reducing tree size.

Abstract

We introduce Monte Carlo Forest Search (MCFS), a class of reinforcement learning (RL) algorithms for learning policies in {tree MDPs}, for which policy execution involves traversing an exponential-sized tree. Examples of such problems include proving unsatisfiability of a SAT formula; counting the number of solutions of a satisfiable SAT formula; and finding the optimal solution to a mixed-integer program. MCFS algorithms can be seen as extensions of Monte Carlo Tree Search (MCTS) to cases where, rather than finding a good path (solution) within a tree, the problem is to find a small tree within a forest of candidate trees. We instantiate and evaluate our ideas in an algorithm that we dub Knuth Synthesis, an MCFS algorithm that learns DPLL branching policies for solving the Boolean satisfiability (SAT) problem, with the objective of achieving good average-case performance on a given distribution of unsatisfiable problem instances. Knuth Synthesis is the first RL approach to avoid the prohibitive costs of policy evaluations in an exponentially-sized tree, leveraging two key ideas: first, we estimate tree size by randomly sampling paths and measuring their lengths, drawing on an unbiased approximation due to Knuth (1975); second, we query a strong solver at a user-defined depth rather than learning a policy across the whole tree, to focus our policy search on early decisions that offer the greatest potential for reducing tree size. We matched or exceeded the performance of a strong baseline on three well-known SAT distributions, facing problems that were two orders of magnitude more challenging than those addressed in previous RL studies.

UNSAT Solver Synthesis via Monte Carlo Forest Search

TL;DR

Abstract

Paper Structure (39 sections, 2 theorems, 3 equations, 6 figures, 5 tables, 7 algorithms)

This paper contains 39 sections, 2 theorems, 3 equations, 6 figures, 5 tables, 7 algorithms.

Introduction
Related Work
Preliminaries
Monte Carlo Tree Search
Boolean Satisfiability
DPLL and Variable Selection Policies
Tree MDPs
Monte Carlo Forest Search
Knuth Synthesis
Nested Monte Carlo Sampling
Bounding Policy Evaluations
Implementation
Tree Policy $\alpha$
Equivariant Architecture
Sharing Prior with Child Nodes
...and 24 more sections

Key Result

Theorem 1

Let $N$ be the set of nodes in a weighted $k$-ary tree $T$ where each node $n\in N$ has weight $w(n)$. The total weight of this tree is $w_{T} = \sum_{n\in N}w(n)$. Let $(n_{0},\ldots, n_{\ell})$ be a path of nodes through the tree from root to leaf that is chosen uniformly at random. Then, $w_{T} =

Figures (6)

Figure 1: A proof tree shows that any variable assignment to an unsatisfiable instance leads to a conflict. Its size substantially varies with the choice of branching policy.
Figure 2: Two keys ideas of Knuth Synthesis to avoid the prohibitive costs of exact tree-size policy evaluations: (1) bounded-depth search and (2) Knuth samples.
Figure 3: MCTSRollout($\alpha,\gamma,\pi$)
Figure 4: Convergence rate improvements with Knuth samples. We normalize tree size across instances by the per-instance minimum and maximum tree size ever observed. Shaded regions represent 95% confidence intervals.
Figure 5: Graph-Q-SAT vs. Knuth synthesis on unsatisfiable R3SAT problems with increasing size from 50 to 250 variabels at 50 variable increments. Performance is measured as the multiplicative factor of improvement over minisat.
...and 1 more figures

Theorems & Definitions (5)

definition thmcounterdefinition
definition thmcounterdefinition: Scavuzzo2022
Theorem 1: Knuth1975
Corollary 1
proof

UNSAT Solver Synthesis via Monte Carlo Forest Search

TL;DR

Abstract

UNSAT Solver Synthesis via Monte Carlo Forest Search

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (5)