Table of Contents
Fetching ...

Synthetic Programming Elicitation for Text-to-Code in Very Low-Resource Programming and Formal Languages

Federico Mora, Justin Wong, Haley Lepe, Sahil Bhatia, Karim Elmaaroufi, George Varghese, Joseph E. Gonzalez, Elizabeth Polgreen, Sanjit A. Seshia

TL;DR

This work proposes designing an intermediate language that LLMs "naturally" know how to use and which can be automatically compiled to a target VLPL, and introduces SPEAC, an approach that enables LLMs to generate syntactically valid code even for VLPLs.

Abstract

Recent advances in large language models (LLMs) for code applications have demonstrated remarkable zero-shot fluency and instruction following on challenging code related tasks ranging from test case generation to self-repair. Unsurprisingly, however, models struggle to compose syntactically valid programs in programming languages unrepresented in pre-training, referred to as very low-resource Programming Languages (VLPLs). VLPLs appear in crucial settings, including domain-specific languages for internal tools, tool-chains for legacy languages, and formal verification frameworks. Inspired by a technique called natural programming elicitation, we propose designing an intermediate language that LLMs "naturally" know how to use and which can be automatically compiled to a target VLPL. When LLMs generate code that lies outside of this intermediate language, we use compiler techniques to repair the code into programs in the intermediate language. Overall, we introduce \emph{synthetic programming elicitation and compilation} (SPEAC), an approach that enables LLMs to generate syntactically valid code even for VLPLs. We empirically evaluate the performance of SPEAC in a case study for the UCLID5 formal verification language and find that, compared to existing retrieval and fine-tuning baselines, SPEAC produces syntactically correct programs more frequently and without sacrificing semantic correctness.

Synthetic Programming Elicitation for Text-to-Code in Very Low-Resource Programming and Formal Languages

TL;DR

This work proposes designing an intermediate language that LLMs "naturally" know how to use and which can be automatically compiled to a target VLPL, and introduces SPEAC, an approach that enables LLMs to generate syntactically valid code even for VLPLs.

Abstract

Recent advances in large language models (LLMs) for code applications have demonstrated remarkable zero-shot fluency and instruction following on challenging code related tasks ranging from test case generation to self-repair. Unsurprisingly, however, models struggle to compose syntactically valid programs in programming languages unrepresented in pre-training, referred to as very low-resource Programming Languages (VLPLs). VLPLs appear in crucial settings, including domain-specific languages for internal tools, tool-chains for legacy languages, and formal verification frameworks. Inspired by a technique called natural programming elicitation, we propose designing an intermediate language that LLMs "naturally" know how to use and which can be automatically compiled to a target VLPL. When LLMs generate code that lies outside of this intermediate language, we use compiler techniques to repair the code into programs in the intermediate language. Overall, we introduce \emph{synthetic programming elicitation and compilation} (SPEAC), an approach that enables LLMs to generate syntactically valid code even for VLPLs. We empirically evaluate the performance of SPEAC in a case study for the UCLID5 formal verification language and find that, compared to existing retrieval and fine-tuning baselines, SPEAC produces syntactically correct programs more frequently and without sacrificing semantic correctness.
Paper Structure (15 sections, 6 figures, 1 table)

This paper contains 15 sections, 6 figures, 1 table.

Figures (6)

  • Figure 1: Partial task description from Lee and Seshia leeseshia (a) and partial output of Eudoxus in UCLID5 (b). We interpret sigG, sigR, sigY to represent green, red, and yellow light signals, respectively. The procedure step captures the transition relation of the state machine. state appears to be a bookkeeping variable that is used to track the cases in the task, and count represents a timer.
  • Figure 2: The SPEAC workflow. Users input $q$, a task in natural language, and $C$, a description of the intermediate language. The LLM takes these inputs and generates $p$, a program in $P$. We use formal techniques to repair $p$ and produce $p'$, a program in $C$ that possibly contains holes. If $p'$ does not contain holes, SPEAC applies $f$, a compiler from $C$ to the target language, $T$, and returns the result. Otherwise, SPEAC generates a new prompt, $q'$, and repeats by asking the LLM to fill in the holes.
  • Figure 3: Hypothetical LLM output with grounded theory codes as comments. Codes are determined and assigned manually.
  • Figure 4: Partial SPEAC prompt templates for first generation (a) and hole filling (b).
  • Figure 5: Partial response for task in Fig. \ref{['fig:running-task']} using gpt-3.5-turbo-0125 (a) and partial first repair (b). Line 5 in (a) declares state as a local variable of bit-vector type; lines 25, 33, and 42 use state as an integer. Line 3 in (b) repairs the type of state to be an integer.
  • ...and 1 more figures