Table of Contents
Fetching ...

Amortizing Pragmatic Program Synthesis with Rankings

Yewen Pu, Saujas Vaduguru, Priyan Vaithilingam, Elena Glassman, Daniel Fried

TL;DR

The paper tackles the scalability challenge of Rational Speech Acts (RSA) for pragmatic program synthesis by proposing an amortization strategy that learns a single, global ranking over all programs. This global ranking is distilled from a dataset of example-dependent RSA rankings generated via simulated RSA interactions, enabling a fast, non-pragmatic synthesizer to rank candidate programs at inference time. The approach preserves much of RSA's communicative accuracy, achieves orders-of-magnitude speedups, and is proven exact in the single-example setting. Through both real-user and simulated replay experiments across regex and animals domains, the method demonstrates practical viability for interactive, real-time pragmatic synthesis without requiring human-labeled data. The work also provides a formal theorem showing that RSA_single can be exactly captured by a global ranking, and discusses limitations and avenues for future improvement, including the existence and effectiveness of global rankings in more complex settings.

Abstract

The usage of Rational Speech Acts (RSA) framework has been successful in building \emph{pragmatic} program synthesizers that return programs which, in addition to being logically consistent with user-generated examples, account for the fact that a user chooses their examples informatively. We present a general method of amortizing the slow, exact RSA synthesizer. Our method first query the exact RSA synthesizer to compile a communication dataset. The dataset contains a number of example-dependent rankings of subsets of programs. It then distills a \textit{single} global ranking of all programs as an approximation to every ranking in the dataset. This global ranking is then used at inference time to rank multiple logically consistent candidate programs generated from a fast, non-pragmatic synthesizer. Experiments on two program synthesis domains using our ranking method resulted in orders of magnitudes of speed ups compared to the exact RSA synthesizer, while being more accurate than a non-pragmatic synthesizer when communicating with humans. Finally, we prove that in the special case of synthesis from a single example, this approximation is exact.

Amortizing Pragmatic Program Synthesis with Rankings

TL;DR

The paper tackles the scalability challenge of Rational Speech Acts (RSA) for pragmatic program synthesis by proposing an amortization strategy that learns a single, global ranking over all programs. This global ranking is distilled from a dataset of example-dependent RSA rankings generated via simulated RSA interactions, enabling a fast, non-pragmatic synthesizer to rank candidate programs at inference time. The approach preserves much of RSA's communicative accuracy, achieves orders-of-magnitude speedups, and is proven exact in the single-example setting. Through both real-user and simulated replay experiments across regex and animals domains, the method demonstrates practical viability for interactive, real-time pragmatic synthesis without requiring human-labeled data. The work also provides a formal theorem showing that RSA_single can be exactly captured by a global ranking, and discusses limitations and avenues for future improvement, including the existence and effectiveness of global rankings in more complex settings.

Abstract

The usage of Rational Speech Acts (RSA) framework has been successful in building \emph{pragmatic} program synthesizers that return programs which, in addition to being logically consistent with user-generated examples, account for the fact that a user chooses their examples informatively. We present a general method of amortizing the slow, exact RSA synthesizer. Our method first query the exact RSA synthesizer to compile a communication dataset. The dataset contains a number of example-dependent rankings of subsets of programs. It then distills a \textit{single} global ranking of all programs as an approximation to every ranking in the dataset. This global ranking is then used at inference time to rank multiple logically consistent candidate programs generated from a fast, non-pragmatic synthesizer. Experiments on two program synthesis domains using our ranking method resulted in orders of magnitudes of speed ups compared to the exact RSA synthesizer, while being more accurate than a non-pragmatic synthesizer when communicating with humans. Finally, we prove that in the special case of synthesis from a single example, this approximation is exact.
Paper Structure (50 sections, 19 equations, 12 figures, 2 algorithms)

This paper contains 50 sections, 19 equations, 12 figures, 2 algorithms.

Figures (12)

  • Figure 1: (left) Directly using the exact RSA algorithm in a pragmatic synthesizer $L_1$ is slow. (right) Our approach uses RSA to generate a simulated communication dataset between the informative speaker $S_1$ and the pragmatic synthesizer $L_1$, and stores the responses of $L_1$ as example-dependent rankings of subsets of programs. We then distill the dataset into a single example-agnostic global ranking of all programs $\sigma[w]$. This global ranking is then used to build a fast pragmatic synthesizer $L_\sigma$, by using the examples only to filter out consistent programs, then using the global ranking to sort them. This amortized synthesizer performs similar selections of programs as an exact RSA synthesizer, while being orders of magnetudes faster.
  • Figure 2: A boolean lexicon for a small reference game of regular expressions. The rows are the utterances (strings) and the columns are hypotheses (regexes), and each entry denotes if a string is consistent with a regex. The $L_0$ and $L_1$ matrices show conditional probabilities that would be inferred by a synthesizer performing literal and pragmatic inference respectively.
  • Figure 3: In the case of incremental RSA, the meaning matrix becomes smaller as more utterances are given, as each utterance rules out hypotheses that are inconsistent with it.
  • Figure 4: Grammar for the regex domain
  • Figure 5: Success rate of the literal $L_0$ and ranking-based $L_\textrm{anneal}$ synthesizers inferring the correct regex as a function of numbers of examples given (turn). $L_\textrm{anneal}$ achieves a success rate of 93.75%, $L_0$ achieves only 65.63%. The ranking-based synthesizer also achieves higher success with fewer utterances. Bands indicate 95% CI over 24 regexes for each condition.
  • ...and 7 more figures