Table of Contents
Fetching ...

Verified Code Transpilation with LLMs

Sahil Bhatia, Jie Qiu, Niranjan Hasabnis, Sanjit A. Seshia, Alvin Cheung

TL;DR

This paper introduces LLMLift, a framework that uses large language models to generate both DSL-target code summaries and loop invariants in a Python-based intermediate representation, followed by SMT-based verification to guarantee functional equivalence with the source program. By translating the verified Python IR back to the target DSL with simple rewrite rules, LLMLift enables verified lifting without DSL-specific training data or handcrafted heuristics. Across four DSL domains (Spark, Domino, TACO, and tensor processing), LLMLift matches or surpasses state-of-the-art symbolic tools in accuracy and speed while drastically reducing implementation effort. The results demonstrate that LLMs, when paired with formal verification, can provide reliable, scalable verified transpilation for diverse DSLs. A two-phase generation (PS then Inv) generally outperforms a single-phase approach, underscoring the value of separating program summaries from invariants for complex DSLs.

Abstract

Domain-specific languages (DSLs) are integral to various software workflows. Such languages offer domain-specific optimizations and abstractions that improve code readability and maintainability. However, leveraging these languages requires developers to rewrite existing code using the specific DSL's API. While large language models (LLMs) have shown some success in automatic code transpilation, none of them provide any functional correctness guarantees on the transpiled code. Another approach for automating this task is verified lifting, which relies on program synthesis to find programs in the target language that are functionally equivalent to the source language program. While several verified lifting tools have been developed for various application domains, they are specialized for specific source-target languages or require significant expertise in domain knowledge to make the search efficient. In this paper, leveraging recent advances in LLMs, we propose an LLM-based approach (LLMLift) to building verified lifting tools. We use the LLM's capabilities to reason about programs to translate a given program into its corresponding equivalent in the target language. Additionally, we use LLMs to generate proofs for functional equivalence. We develop lifting-based compilers for {\em four different} DSLs targeting different application domains. Our approach not only outperforms previous symbolic-based tools in both the number of benchmarks transpiled and transpilation time, but also requires significantly less effort to build.

Verified Code Transpilation with LLMs

TL;DR

This paper introduces LLMLift, a framework that uses large language models to generate both DSL-target code summaries and loop invariants in a Python-based intermediate representation, followed by SMT-based verification to guarantee functional equivalence with the source program. By translating the verified Python IR back to the target DSL with simple rewrite rules, LLMLift enables verified lifting without DSL-specific training data or handcrafted heuristics. Across four DSL domains (Spark, Domino, TACO, and tensor processing), LLMLift matches or surpasses state-of-the-art symbolic tools in accuracy and speed while drastically reducing implementation effort. The results demonstrate that LLMs, when paired with formal verification, can provide reliable, scalable verified transpilation for diverse DSLs. A two-phase generation (PS then Inv) generally outperforms a single-phase approach, underscoring the value of separating program summaries from invariants for complex DSLs.

Abstract

Domain-specific languages (DSLs) are integral to various software workflows. Such languages offer domain-specific optimizations and abstractions that improve code readability and maintainability. However, leveraging these languages requires developers to rewrite existing code using the specific DSL's API. While large language models (LLMs) have shown some success in automatic code transpilation, none of them provide any functional correctness guarantees on the transpiled code. Another approach for automating this task is verified lifting, which relies on program synthesis to find programs in the target language that are functionally equivalent to the source language program. While several verified lifting tools have been developed for various application domains, they are specialized for specific source-target languages or require significant expertise in domain knowledge to make the search efficient. In this paper, leveraging recent advances in LLMs, we propose an LLM-based approach (LLMLift) to building verified lifting tools. We use the LLM's capabilities to reason about programs to translate a given program into its corresponding equivalent in the target language. Additionally, we use LLMs to generate proofs for functional equivalence. We develop lifting-based compilers for {\em four different} DSLs targeting different application domains. Our approach not only outperforms previous symbolic-based tools in both the number of benchmarks transpiled and transpilation time, but also requires significantly less effort to build.
Paper Structure (18 sections, 2 equations, 10 figures, 1 table)

This paper contains 18 sections, 2 equations, 10 figures, 1 table.

Figures (10)

  • Figure 1: Sequential source code in Java and semantics of DSL in IR
  • Figure 2: A high-level overview of our LLMLift framework for building verified lifting-based tools.
  • Figure 3: End-to-End Lifting Example
  • Figure 4: Prompt Structure
  • Figure 5: Program summary guessing prompt
  • ...and 5 more figures