Table of Contents
Fetching ...

Generating Pragmatic Examples to Train Neural Program Synthesizers

Saujas Vaduguru, Daniel Fried, Yewen Pu

TL;DR

PraX tackles the ambiguity inherent in programming-by-example by amortizing pragmatic reasoning through self-generated data. It jointly trains neural speaker and listener models, using RSA-based selection to curate informative training examples, thereby disambiguating user intent without human supervision. In the regular expression domain, PraX outperforms non-pragmatic baselines and matches the performance of models fine-tuned on human pragmatic data, while requiring no human annotations for training. This approach offers a scalable path to pragmatic program synthesis and suggests potential extensions to more open-ended coding tasks and multimodal specifications.

Abstract

Programming-by-example is the task of synthesizing a program that is consistent with a set of user-provided input-output examples. As examples are often an under-specification of one's intent, a good synthesizer must choose the intended program from the many that are consistent with the given set of examples. Prior work frames program synthesis as a cooperative game between a listener (that synthesizes programs) and a speaker (a user choosing examples), and shows that models of computational pragmatic inference are effective in choosing the user intended programs. However, these models require counterfactual reasoning over a large set of programs and examples, which is infeasible in realistic program spaces. In this paper, we propose PraX, a novel way to amortize this search with neural networks. We sample pairs of programs and examples via self-play between listener and speaker models, and use pragmatic inference to choose informative training examples from this sample. We then use the informative dataset to train models to improve the synthesizer's ability to disambiguate user-provided examples without human supervision. We validate PraX on the challenging task of synthesizing regular expressions from example strings, and find that our method (1) outperforms models trained without choosing pragmatic examples by 23% (a 51% relative increase) (2) matches the performance of supervised learning on a dataset of pragmatic examples provided by humans, despite using no human data in training.

Generating Pragmatic Examples to Train Neural Program Synthesizers

TL;DR

PraX tackles the ambiguity inherent in programming-by-example by amortizing pragmatic reasoning through self-generated data. It jointly trains neural speaker and listener models, using RSA-based selection to curate informative training examples, thereby disambiguating user intent without human supervision. In the regular expression domain, PraX outperforms non-pragmatic baselines and matches the performance of models fine-tuned on human pragmatic data, while requiring no human annotations for training. This approach offers a scalable path to pragmatic program synthesis and suggests potential extensions to more open-ended coding tasks and multimodal specifications.

Abstract

Programming-by-example is the task of synthesizing a program that is consistent with a set of user-provided input-output examples. As examples are often an under-specification of one's intent, a good synthesizer must choose the intended program from the many that are consistent with the given set of examples. Prior work frames program synthesis as a cooperative game between a listener (that synthesizes programs) and a speaker (a user choosing examples), and shows that models of computational pragmatic inference are effective in choosing the user intended programs. However, these models require counterfactual reasoning over a large set of programs and examples, which is infeasible in realistic program spaces. In this paper, we propose PraX, a novel way to amortize this search with neural networks. We sample pairs of programs and examples via self-play between listener and speaker models, and use pragmatic inference to choose informative training examples from this sample. We then use the informative dataset to train models to improve the synthesizer's ability to disambiguate user-provided examples without human supervision. We validate PraX on the challenging task of synthesizing regular expressions from example strings, and find that our method (1) outperforms models trained without choosing pragmatic examples by 23% (a 51% relative increase) (2) matches the performance of supervised learning on a dataset of pragmatic examples provided by humans, despite using no human data in training.
Paper Structure (38 sections, 4 equations, 11 figures, 2 tables, 2 algorithms)

This paper contains 38 sections, 4 equations, 11 figures, 2 tables, 2 algorithms.

Figures (11)

  • Figure 1: PraX iteratively generates datasets containing increasingly informative program specifications (lists of examples consistent with the program), and updates models on the generated datasets. ➀ We use a Speaker model --- that generates an example consistent with a target program --- to propose a set of candidate specifications. Using the Rational Speech Acts model of pragmatic reasoning (red box; described in in \ref{['fig:rsa']}), we choose the example that is most informative to a Listener model that synthesizes programs consistent with a given specification. In this manner, we incrementally build the list of examples spec for the program. We repeat this for different programs to create a dataset of informative program-spec pairs. ➁ We use the dataset to update the Speaker and Listener models. We train the speaker to generate the selected pragmatic examples, and the listener to synthesize the target program given the generated examples.
  • Figure 2: An illustration of how the Rational Speech Acts framework is used to select an informative example for a given program. We start with the matrix corresponding to the consistency relation between the sample of programs and examples shown in \ref{['fig:teaser']}. We obtain a literal listener distribution $L_0$ over programs for each example by normalizing the rows of this matrix. Since the $M$ matrix is binary, each row in $L_0$ is a uniform distribution over consistent programs in the sample --- any of the consistent programs is equally likely to be the intended program. We then obtain a pragmatic speaker distribution $S_1$ by normalizing the columns of the $L_0$ matrix: modeling the probability an informative speaker might have for choosing each example when communicating a program to a literal listener. RSA outputs the highest-probability example in $S_1$ (e.g., (aa, $\textrm{✓}$)) in the column corresponding to the target program (e.g., a+b*).
  • Figure 3: Performance of various models as a function of turns, measured in (\ref{['fig:top_1_interact']}) $\textsc{Top-1}@t$, (\ref{['fig:top_10_interact']}) $\textsc{Top-1}@t$, and (\ref{['fig:ed_1_interact']}) $\textsc{Edit Distance}\leq 1@t$. Lines show averages, and bands are standard errors. Our model PraX, trained entirely from self-play and RSA inference without using human-provided data performs better than the non-pragmatic Literal model across all turns and metrics, and matches the performance of hft tuned on a human-provided examples.
  • Figure 4: Example specifications for two programs provided during the user study, along with the highest ranked guess from the Literal and the PraX models.
  • Figure 5: Top-1 metric over the course of rounds of training of the PraX model. We report the metric on the validaton set as well evaluating on all interactions from the user study in the replay setting (similar to how we evaluated gpt-3.5). We compare the accuracy over rounds of training to generating specifications and updating the models only once, amounting to a single round of the procedure with more programs (PraX-single-round). We also compare to fine-tuning the base model on 400 pairs (same number as hft) generated by the speaker in the 5th round of training (PraX-hft-match) to assess the quality of our speaker-generated examples.
  • ...and 6 more figures