Table of Contents
Fetching ...

Guess & Sketch: Language Model Guided Transpilation

Celine Lee, Abdulrahman Mahmoud, Michal Kurek, Simone Campanoni, David Brooks, Stephen Chong, Gu-Yeon Wei, Alexander M. Rush

TL;DR

Guess&Sketch extracts alignment and confidence information from features of the LM then passes it to a symbolic solver to resolve semantic equivalence of the transpilation input and output, and it successfully transpiles 57.6% more examples than GPT-4 and 39.

Abstract

Maintaining legacy software requires many software and systems engineering hours. Assembly code programs, which demand low-level control over the computer machine state and have no variable names, are particularly difficult for humans to analyze. Existing conventional program translators guarantee correctness, but are hand-engineered for the source and target programming languages in question. Learned transpilation, i.e. automatic translation of code, offers an alternative to manual re-writing and engineering efforts. Automated symbolic program translation approaches guarantee correctness but struggle to scale to longer programs due to the exponentially large search space. Their rigid rule-based systems also limit their expressivity, so they can only reason about a reduced space of programs. Probabilistic neural language models (LMs) produce plausible outputs for every input, but do so at the cost of guaranteed correctness. In this work, we leverage the strengths of LMs and symbolic solvers in a neurosymbolic approach to learned transpilation for assembly code. Assembly code is an appropriate setting for a neurosymbolic approach, since assembly code can be divided into shorter non-branching basic blocks amenable to the use of symbolic methods. Guess & Sketch extracts alignment and confidence information from features of the LM then passes it to a symbolic solver to resolve semantic equivalence of the transpilation input and output. We test Guess & Sketch on three different test sets of assembly transpilation tasks, varying in difficulty, and show that it successfully transpiles 57.6% more examples than GPT-4 and 39.6% more examples than an engineered transpiler. We also share a training and evaluation dataset for this task.

Guess & Sketch: Language Model Guided Transpilation

TL;DR

Guess&Sketch extracts alignment and confidence information from features of the LM then passes it to a symbolic solver to resolve semantic equivalence of the transpilation input and output, and it successfully transpiles 57.6% more examples than GPT-4 and 39.

Abstract

Maintaining legacy software requires many software and systems engineering hours. Assembly code programs, which demand low-level control over the computer machine state and have no variable names, are particularly difficult for humans to analyze. Existing conventional program translators guarantee correctness, but are hand-engineered for the source and target programming languages in question. Learned transpilation, i.e. automatic translation of code, offers an alternative to manual re-writing and engineering efforts. Automated symbolic program translation approaches guarantee correctness but struggle to scale to longer programs due to the exponentially large search space. Their rigid rule-based systems also limit their expressivity, so they can only reason about a reduced space of programs. Probabilistic neural language models (LMs) produce plausible outputs for every input, but do so at the cost of guaranteed correctness. In this work, we leverage the strengths of LMs and symbolic solvers in a neurosymbolic approach to learned transpilation for assembly code. Assembly code is an appropriate setting for a neurosymbolic approach, since assembly code can be divided into shorter non-branching basic blocks amenable to the use of symbolic methods. Guess & Sketch extracts alignment and confidence information from features of the LM then passes it to a symbolic solver to resolve semantic equivalence of the transpilation input and output. We test Guess & Sketch on three different test sets of assembly transpilation tasks, varying in difficulty, and show that it successfully transpiles 57.6% more examples than GPT-4 and 39.6% more examples than an engineered transpiler. We also share a training and evaluation dataset for this task.
Paper Structure (49 sections, 3 equations, 6 figures, 5 tables, 1 algorithm)

This paper contains 49 sections, 3 equations, 6 figures, 5 tables, 1 algorithm.

Figures (6)

  • Figure 1: In the Guess (top) phase, the full input sequence $x$ (blue) is passed to a trained language model (LM), which produces a candidate translation $y$ (orange), identifies potential mistakes (red), and extracts subsequence alignment (purple) from attention between the input and output (attn. map). In the Sketch (bottom) phase, aligned input and output subsequences are passed to a symbolic solver $\lambda$ to correct errors identified in the Guess phase. The final output $y'$ is constructed by recombining corrected subsequences.
  • Figure 2: True subsequence alignment (l), attention (r), and projected subsequence alignment (r) from the Guess model.
  • Figure 3: Test sets for transpilation. Length is measured as number of lines in the assembly file, and is averaged across both ARMv8 and RISC-V architectures under the -O0 optimization flag.
  • Figure 4: Example outputs.
  • Figure 5: Assembly instructions are mapped to Rosette instructions according to the semantics of the corresponding assembly hardware ISA (sample shown at top). Holes in the sequence (indicated in dashed red rectangles) are translated into Rosette symbolic constants. The resulting Rosette instructions, along with the input and output registers, are plugged into a Rosette function template (bottom) to generate a full Rosette program whose solution produces a corrected mapping from holes to values.
  • ...and 1 more figures