Generative Explanations for Program Synthesizers

Amirmohammad Nazari; Souti Chattopadhyay; Swabha Swayamdipta; Mukund Raghothaman

Generative Explanations for Program Synthesizers

Amirmohammad Nazari, Souti Chattopadhyay, Swabha Swayamdipta, Mukund Raghothaman

TL;DR

This paper develops an approach to reliably augment the implementation with explanatory names and finds that users significantly prefer the explanations produced by the technique to the baseline, and that the proposed names measurably help users in understanding the synthesized implementation.

Abstract

Despite great advances in program synthesis techniques, they remain algorithmic black boxes. Although they guarantee that when synthesis is successful, the implementation satisfies the specification, they provide no additional information regarding how the implementation works or the manner in which the specification is realized. One possibility to answer these questions is to use large language models (LLMs) to construct human-readable explanations. Unfortunately, experiments reveal that LLMs frequently produce nonsensical or misleading explanations when applied to the unidiomatic code produced by program synthesizers. In this paper, we develop an approach to reliably augment the implementation with explanatory names. We recover fine-grained input-output data from the synthesis algorithm to enhance the prompt supplied to the LLM, and use a combination of a program verifier and a second language model to validate the proposed explanations before presenting them to the user. Together, these techniques massively improve the accuracy of the proposed names, from 24% to 79% respectively. Through a pair of small user studies, we find that users significantly prefer the explanations produced by our technique (76% of responses indicating the appropriateness of the presenting names) to the baseline (with only 2% of responses approving of the suggestions), and that the proposed names measurably help users in understanding the synthesized implementation.

Generative Explanations for Program Synthesizers

TL;DR

Abstract

Paper Structure (35 sections, 1 theorem, 6 equations, 16 figures, 3 tables, 1 algorithm)

This paper contains 35 sections, 1 theorem, 6 equations, 16 figures, 3 tables, 1 algorithm.

Introduction
Contributions.
Overview and Motivating Example
The Baseline LLM
Prompt Expansion Using Subspecifications
Algorithmic Sanity Checks
Algorithmic Name Synthesis
Prompt Expansion Using Subspecifications
Algorithmic Sanity Checks
Optimizations
Retries.
Bottom-up name generation.
Experimental Evaluation
Benchmarks.
Baselines.
...and 20 more sections

Key Result

Lemma 3.1

Let $E$ be a set of input-output examples, and $f \models E$ be a conformant implementation. Let $g$ be a subroutine in $f$, with local input-output subspecification $E_g$. Pick a function $g'$ such that $g' \models E_g$. Then $f[g' / g] \models E$.

Figures (16)

Figure 1: Program produced by DreamCoder that sorts a list of numbers. The top-level function is $f$. Equation \ref{['eq:motiv:sort-spec']} is an excerpt of the specification supplied to the synthesizer. This program has been transliterated into Python from the original lambda-expression which may be found in the supplementary material.
Figure 2: Example prompt for name generation when extended with local input-output subspecifications.
Figure 3: Overall architecture of NomNom. We begin with a specification-implementation pair, $(E, f)$, and a subroutine $g$ of interest. The system alternates between querying a first LLM to obtain proposals $w$ for $g$ and validating $w$ by resynthesizing an alternative implementation $g'$ using a second language model.
Figure 4: Effectiveness of the algorithmic variants. $\operatorname{SS}$ indicates prompt expansion with local input-output subspecifications, $\operatorname{SS} + \operatorname{F}$ indicates the subsequent algorithmic sanity check, $(\operatorname{SS} + \operatorname{F})\lcirclearrowright$ indicates the version which repeatedly retries upon failure, and NomNom indicates our final system with bottom-up name generation. The bars represent the median of five independent executions.
Figure 5: Time needed and accuracy of participant responses when explaining how implementations worked (Study 1). The missing bars for the baseline correspond to questions that no participant was able to successfully answer.
...and 11 more figures

Theorems & Definitions (1)

Lemma 3.1

Generative Explanations for Program Synthesizers

TL;DR

Abstract

Generative Explanations for Program Synthesizers

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (16)

Theorems & Definitions (1)