Table of Contents
Fetching ...

Boolean Satisfiability via Imitation Learning

Zewei Zhang, Huan Liu, Yuanhao Yu, Jun Chen, Xiangyu Xu

TL;DR

ImSAT confronts the long-standing challenge of branching in conflict-driven clause learning by training a policy through imitation learning on expert KeyTrace traces. By collapsing full solver runs into surviving decisions and treating branching as prefix-conditioned autoregressive sequence prediction, ImitSAT delivers dense, high-quality supervision with minimal detours, enabling stable and efficient training. Empirical results show substantial reductions in propagation counts and favorable wall-clock time, with strong generalization to unseen SAT families beyond simple random 3-SAT. The work demonstrates that imitation-based guidance can outperform prior learned approaches and integrates naturally into CDCL without compromising solver completeness or robustness.

Abstract

We propose ImitSAT, a branching policy for conflict-driven clause learning (CDCL) solvers based on imitation learning for the Boolean satisfiability problem (SAT). Unlike previous methods that predict instance-level signals to improve CDCL branching indirectly, or rely on reinforcement learning and insufficient CDCL information to enhance branching, ImitSAT learns from expert KeyTrace that collapses a full run into the sequence of surviving decisions. Replaying a KeyTrace on the same instance is nearly conflict-free, providing dense decision-level supervision and directly reducing propagations -- the dominant contributor to wall-clock time. This prefix-conditioned supervision enables ImitSAT to reproduce high-quality branches without exploration, yielding faster convergence, stable training, and seamless integration into CDCL. Extensive experiments demonstrate that ImitSAT reduces propagation counts and runtime, outperforming state-of-the-art learned approaches. We released the source code and trained model at https://github.com/zewei-Zhang/ImitSAT

Boolean Satisfiability via Imitation Learning

TL;DR

ImSAT confronts the long-standing challenge of branching in conflict-driven clause learning by training a policy through imitation learning on expert KeyTrace traces. By collapsing full solver runs into surviving decisions and treating branching as prefix-conditioned autoregressive sequence prediction, ImitSAT delivers dense, high-quality supervision with minimal detours, enabling stable and efficient training. Empirical results show substantial reductions in propagation counts and favorable wall-clock time, with strong generalization to unseen SAT families beyond simple random 3-SAT. The work demonstrates that imitation-based guidance can outperform prior learned approaches and integrates naturally into CDCL without compromising solver completeness or robustness.

Abstract

We propose ImitSAT, a branching policy for conflict-driven clause learning (CDCL) solvers based on imitation learning for the Boolean satisfiability problem (SAT). Unlike previous methods that predict instance-level signals to improve CDCL branching indirectly, or rely on reinforcement learning and insufficient CDCL information to enhance branching, ImitSAT learns from expert KeyTrace that collapses a full run into the sequence of surviving decisions. Replaying a KeyTrace on the same instance is nearly conflict-free, providing dense decision-level supervision and directly reducing propagations -- the dominant contributor to wall-clock time. This prefix-conditioned supervision enables ImitSAT to reproduce high-quality branches without exploration, yielding faster convergence, stable training, and seamless integration into CDCL. Extensive experiments demonstrate that ImitSAT reduces propagation counts and runtime, outperforming state-of-the-art learned approaches. We released the source code and trained model at https://github.com/zewei-Zhang/ImitSAT

Paper Structure

This paper contains 43 sections, 19 equations, 5 figures, 10 tables, 1 algorithm.

Figures (5)

  • Figure 1: From a CDCL run to a KeyTrace. The left pane sketches one run in three stages: (1) decide $x_4=\top$, then $x_3=\top$, hit a conflict, backtrack to the decision on $x_3$; (2) set $x_3=\bot$, conflict again, backtrack to $x_4$; (3) set $x_4=\bot$, then decide $x_1=\top$ and $x_2=\top$ and solve. Unit assignments are omitted for clarity. Collapsing backtracks removes detours and keeps only the surviving root–to–current decisions, yielding the compact KeyTrace on the right, which serves as the expert for imitation. A step‑by‑step walkthrough is in Appendix \ref{['appendix:keytrace']}.
  • Figure 2: Propagation dominates CDCL time. MiniSAT spends about 88.9% in propagation, 9.2% in conflict analysis, and 1.9% in branching. Reducing propagation is therefore the main route to wall‑clock gains.
  • Figure 3: Compact KeyTrace replay. The expert replayed only a small share of MiniSAT events: 0.2% conflicts, 19.6% decisions, and 4.3% propagations.
  • Figure 4: Wall‑clock time versus instances solved on random test sets and structured families. Curves show end‑to‑end solver time per instance with all model calls included and preprocessing excluded. ImitSAT and Graph‑Q‑SAT use 3 calls per instance. SATformer performs a single VSIDS initialization. The lower curve is better.
  • Figure 5: Training and validation loss curves with and without variable‑permutation augmentation on the 5–15 range for 20 epochs. Permutation keeps training and validation closely aligned and prevents overfitting, while removing it lowers training loss but drives validation loss up.