Table of Contents
Fetching ...

SPoC: Search-based Pseudocode to Code

Sumith Kulal, Panupong Pasupat, Kartik Chandra, Mina Lee, Oded Padon, Alex Aiken, Percy Liang

TL;DR

This work tackles the challenge of synthesizing long, functionally correct programs from human-authored pseudocode by framing translation as a search problem over per-line code candidates. It introduces SPoC, a large dataset with pseudocode, test cases, and multiple programs per problem, to enable functional evaluation beyond surface metrics. The authors propose error localization based on compilation errors to guide search, with two methods: a multiclass predictor and prefix-based pruning, and demonstrate substantial gains over naive top-one translations under a fixed budget. Overall, SPoC and the error-localization-guided search provide a scalable approach to improve synthesis efficiency and functional correctness for non-trivial code.

Abstract

We consider the task of mapping pseudocode to long programs that are functionally correct. Given test cases as a mechanism to validate programs, we search over the space of possible translations of the pseudocode to find a program that passes the validation. However, without proper credit assignment to localize the sources of program failures, it is difficult to guide search toward more promising programs. We propose to perform credit assignment based on signals from compilation errors, which constitute 88.7% of program failures. Concretely, we treat the translation of each pseudocode line as a discrete portion of the program, and whenever a synthesized program fails to compile, an error localization method tries to identify the portion of the program responsible for the failure. We then focus search over alternative translations of the pseudocode for those portions. For evaluation, we collected the SPoC dataset (Search-based Pseudocode to Code) containing 18,356 programs with human-authored pseudocode and test cases. Under a budget of 100 program compilations, performing search improves the synthesis success rate over using the top-one translation of the pseudocode from 25.6% to 44.7%.

SPoC: Search-based Pseudocode to Code

TL;DR

This work tackles the challenge of synthesizing long, functionally correct programs from human-authored pseudocode by framing translation as a search problem over per-line code candidates. It introduces SPoC, a large dataset with pseudocode, test cases, and multiple programs per problem, to enable functional evaluation beyond surface metrics. The authors propose error localization based on compilation errors to guide search, with two methods: a multiclass predictor and prefix-based pruning, and demonstrate substantial gains over naive top-one translations under a fixed budget. Overall, SPoC and the error-localization-guided search provide a scalable approach to improve synthesis efficiency and functional correctness for non-trivial code.

Abstract

We consider the task of mapping pseudocode to long programs that are functionally correct. Given test cases as a mechanism to validate programs, we search over the space of possible translations of the pseudocode to find a program that passes the validation. However, without proper credit assignment to localize the sources of program failures, it is difficult to guide search toward more promising programs. We propose to perform credit assignment based on signals from compilation errors, which constitute 88.7% of program failures. Concretely, we treat the translation of each pseudocode line as a discrete portion of the program, and whenever a synthesized program fails to compile, an error localization method tries to identify the portion of the program responsible for the failure. We then focus search over alternative translations of the pseudocode for those portions. For evaluation, we collected the SPoC dataset (Search-based Pseudocode to Code) containing 18,356 programs with human-authored pseudocode and test cases. Under a budget of 100 program compilations, performing search improves the synthesis success rate over using the top-one translation of the pseudocode from 25.6% to 44.7%.

Paper Structure

This paper contains 24 sections, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Given $L$ pseudocode lines $x_{1:L}$ (with indentation levels $\ell_{1:L}$) and public test cases, our task is to synthesize a program with code lines $y_{1:L}$. The program is evaluated against both public and hidden test cases.
  • Figure 2: Illustration of best-first search and error localization model. In this example, ($c_{11}, c_{22}, c_{32}$) satisfies the test cases. Best-first search iterates in the order of decreasing probabilities and succeeds in four compiler calls. The error localization method down-weights $c_{21}$, leading to an earlier success.
  • Figure 3: (a) While the translation accuracy is high at the line level, we need to consider the result at the program level. For each program, we count the number of lines $i$ where (b) the top candidate $c_{i1}$ is incorrect, and (c) none of the candidates $c_{ij} \in C_i$ is correct.
  • Figure 4: Success rates at budgets $B$ of best-first search with different error localization methods.
  • Figure 5: Examples of programs synthesized during search. In Program 1, prefix-based pruning detects that the prefix up to line 9 is offending. In Program 2, the multiclass model incorrectly predicts line 3 as the offending line, which ultimately leads to a failure.