Reinforcement Learning and Data-Generation for Syntax-Guided Synthesis
Julian Parsert, Elizabeth Polgreen
TL;DR
Problem: automatic synthesis of programs under both syntactic and logical constraints (SyGuS). Approach: treat SyGuS as a tree search over a context-free grammar and solve with reinforcement-learning-guided Monte-Carlo Tree Search (MCTS), learning policy and value functions via an RL loop and verifying candidates with an SMT solver; data for learning is generated automatically through anti-unification of SMT problems. Contributions: (i) a novel MCTS-based synthesis algorithm with learned guidance, (ii) a data-generation pipeline using anti-unification to produce large, feasible training sets, and (iii) empirical results showing a >26 percentage-point improvement over a baseline and competitive performance against cvc5 on training problems, plus a public dataset for further research. Significance: demonstrates that data-driven, grammar-guided search can significantly enhance SyGuS, offering practical improvements and enabling broader ML experimentation in program synthesis.
Abstract
Program synthesis is the task of automatically generating code based on a specification. In Syntax-Guided Synthesis (SyGuS) this specification is a combination of a syntactic template and a logical formula, and the result is guaranteed to satisfy both. We present a reinforcement-learning guided algorithm for SyGuS which uses Monte-Carlo Tree Search (MCTS) to search the space of candidate solutions. Our algorithm learns policy and value functions which, combined with the upper confidence bound for trees, allow it to balance exploration and exploitation. A common challenge in applying machine learning approaches to syntax-guided synthesis is the scarcity of training data. To address this, we present a method for automatically generating training data for SyGuS based on anti-unification of existing first-order satisfiability problems, which we use to train our MCTS policy. We implement and evaluate this setup and demonstrate that learned policy and value improve the synthesis performance over a baseline by over 26 percentage points in the training and testing sets. Our tool outperforms state-of-the-art tool cvc5 on the training set and performs comparably in terms of the total number of problems solved on the testing set (solving 23% of the benchmarks on which cvc5 fails). We make our data set publicly available, to enable further application of machine learning methods to the SyGuS problem.
