Table of Contents
Fetching ...

BUSTLE: Bottom-Up Program Synthesis Through Learning-Guided Exploration

Augustus Odena, Kensen Shi, David Bieber, Rishabh Singh, Charles Sutton, Hanjun Dai

TL;DR

The paper addresses the challenge of search efficiency in program synthesis by integrating a learning-guided, bottom-up search that uses execution results of partial programs as semantic features.A lightweight classifier, trained with property-signature features derived from inputs, outputs, and intermediate values, reweights sub-expressions during enumeration to favor those likely to appear in a final solution.The approach is evaluated on two string-transformation benchmarks, showing that Bustle, especially when combined with domain heuristics, outperforms baselines and end-to-end neural approaches in both solved-task counts and wall-clock time.Key contributions include the property-signature framework, batched inference for speed, and a synthetic data generation protocol for training the classifier, with demonstrated generalization to human-written benchmarks.

Abstract

Program synthesis is challenging largely because of the difficulty of search in a large space of programs. Human programmers routinely tackle the task of writing complex programs by writing sub-programs and then analyzing their intermediate results to compose them in appropriate ways. Motivated by this intuition, we present a new synthesis approach that leverages learning to guide a bottom-up search over programs. In particular, we train a model to prioritize compositions of intermediate values during search conditioned on a given set of input-output examples. This is a powerful combination because of several emergent properties. First, in bottom-up search, intermediate programs can be executed, providing semantic information to the neural network. Second, given the concrete values from those executions, we can exploit rich features based on recent work on property signatures. Finally, bottom-up search allows the system substantial flexibility in what order to generate the solution, allowing the synthesizer to build up a program from multiple smaller sub-programs. Overall, our empirical evaluation finds that the combination of learning and bottom-up search is remarkably effective, even with simple supervised learning approaches. We demonstrate the effectiveness of our technique on two datasets, one from the SyGuS competition and one of our own creation.

BUSTLE: Bottom-Up Program Synthesis Through Learning-Guided Exploration

TL;DR

The paper addresses the challenge of search efficiency in program synthesis by integrating a learning-guided, bottom-up search that uses execution results of partial programs as semantic features.A lightweight classifier, trained with property-signature features derived from inputs, outputs, and intermediate values, reweights sub-expressions during enumeration to favor those likely to appear in a final solution.The approach is evaluated on two string-transformation benchmarks, showing that Bustle, especially when combined with domain heuristics, outperforms baselines and end-to-end neural approaches in both solved-task counts and wall-clock time.Key contributions include the property-signature framework, batched inference for speed, and a synthetic data generation protocol for training the classifier, with demonstrated generalization to human-written benchmarks.

Abstract

Program synthesis is challenging largely because of the difficulty of search in a large space of programs. Human programmers routinely tackle the task of writing complex programs by writing sub-programs and then analyzing their intermediate results to compose them in appropriate ways. Motivated by this intuition, we present a new synthesis approach that leverages learning to guide a bottom-up search over programs. In particular, we train a model to prioritize compositions of intermediate values during search conditioned on a given set of input-output examples. This is a powerful combination because of several emergent properties. First, in bottom-up search, intermediate programs can be executed, providing semantic information to the neural network. Second, given the concrete values from those executions, we can exploit rich features based on recent work on property signatures. Finally, bottom-up search allows the system substantial flexibility in what order to generate the solution, allowing the synthesizer to build up a program from multiple smaller sub-programs. Overall, our empirical evaluation finds that the combination of learning and bottom-up search is remarkably effective, even with simple supervised learning approaches. We demonstrate the effectiveness of our technique on two datasets, one from the SyGuS competition and one of our own creation.

Paper Structure

This paper contains 19 sections, 1 equation, 4 figures, 1 algorithm.

Figures (4)

  • Figure 1: Domain-specific language (DSL) of expressions considered in this paper.
  • Figure 2: Diagram outlining the Bustle approach. The bold arrows show the main feedback loop. The bottom-up synthesizer enumerates values (code expressions), which are featurized along with the I/O example using property signatures. The property signatures are passed to a trained model that reweights the value based on whether it appears to be a sub-expression of a solution, and the reweighted value is given back to the bottom-up synthesizer for use in enumerating larger expressions.
  • Figure 3: (Left) Benchmarks solved as a function of intermediate expressions considered. This metric makes Bustle look somewhat better than it is, because it ignores slowdowns in wall-clock time, but it is still important to analyze. It is invariant to engineering considerations, providing an upper bound on how well we can do in wall-clock terms through speeding up the model. (Right) Benchmarks solved over elapsed wall-clock time. Bustle still outperforms all baselines on our 38 new tasks, but not by quite as much due to time spent on model inference.
  • Figure 4: Histograms of model predictions for expressions seen while solving benchmarks. (Left) for expressions that were sub-expressions of a solution, the majority received predictions close to 1, showing that the model can identify the correct expressions to prioritize during search. (Right) for expressions that were not sub-expressions of a solution, predictions skewed close to 0.