Table of Contents
Fetching ...

Scalable, Validated Code Translation of Entire Projects using Large Language Models

Hanliang Zhang, Cristina David, Meng Wang, Brandon Paulsen, Daniel Kroening

TL;DR

This work addresses the challenge of scalable, semantically correct code translation for entire projects using large language models. It introduces Oxidizer, a two-phase translation pipeline that first ensures type-compatibility and then validates I/O equivalence via execution-snapshot-guided testing, guided by feature mapping rules and robust type-compatibility checks. The method enables translating Go projects to Rust with high compilation success and an average of 73% of functions being I/O equivalent, outperforming prior approaches that relied on either purely symbolic rules or unvalidated LLM outputs. The approach is demonstrated on seven real Go projects, illustrating practical impact for large-scale, multi-language software maintenance and modernization. These results suggest significant reductions in developer effort for full-project translations and highlight the importance of structured guidance and semantic validation in LLM-driven code translation.

Abstract

Large language models (LLMs) show promise in code translation due to their ability to generate idiomatic code. However, a significant limitation when using LLMs for code translation is scalability: existing works have shown a drop in translation success rates for code exceeding around 100 lines. We overcome this limitation by developing a modular approach to translation, where we partition the code into small code fragments which can be translated independently and semantically validated (that is, checking I/O equivalence). When this approach is applied naively, we discover that LLMs are unreliable when translating features of the source language that do not have a direct mapping to the target language, and that the LLM often gets stuck in repair loops when attempting to fix errors. To address these issues, we introduce two key concepts: (1) feature mapping, which integrates predefined translation rules with LLM-based translation to guide the LLM in navigating subtle language differences and producing semantically accurate code; and (2) type-compatibility, which facilitates localized checks at the function signature level to detect errors early, thereby narrowing the scope of potential repairs. We apply our approach to translating real-world Go codebases to Rust, demonstrating that we can consistently generate reliable Rust translations for projects up to 6,600 lines of code and 369 functions, with an average of 73% of functions successfully validated for I/O equivalence, considerably higher than any existing work.

Scalable, Validated Code Translation of Entire Projects using Large Language Models

TL;DR

This work addresses the challenge of scalable, semantically correct code translation for entire projects using large language models. It introduces Oxidizer, a two-phase translation pipeline that first ensures type-compatibility and then validates I/O equivalence via execution-snapshot-guided testing, guided by feature mapping rules and robust type-compatibility checks. The method enables translating Go projects to Rust with high compilation success and an average of 73% of functions being I/O equivalent, outperforming prior approaches that relied on either purely symbolic rules or unvalidated LLM outputs. The approach is demonstrated on seven real Go projects, illustrating practical impact for large-scale, multi-language software maintenance and modernization. These results suggest significant reductions in developer effort for full-project translations and highlight the importance of structured guidance and semantic validation in LLM-driven code translation.

Abstract

Large language models (LLMs) show promise in code translation due to their ability to generate idiomatic code. However, a significant limitation when using LLMs for code translation is scalability: existing works have shown a drop in translation success rates for code exceeding around 100 lines. We overcome this limitation by developing a modular approach to translation, where we partition the code into small code fragments which can be translated independently and semantically validated (that is, checking I/O equivalence). When this approach is applied naively, we discover that LLMs are unreliable when translating features of the source language that do not have a direct mapping to the target language, and that the LLM often gets stuck in repair loops when attempting to fix errors. To address these issues, we introduce two key concepts: (1) feature mapping, which integrates predefined translation rules with LLM-based translation to guide the LLM in navigating subtle language differences and producing semantically accurate code; and (2) type-compatibility, which facilitates localized checks at the function signature level to detect errors early, thereby narrowing the scope of potential repairs. We apply our approach to translating real-world Go codebases to Rust, demonstrating that we can consistently generate reliable Rust translations for projects up to 6,600 lines of code and 369 functions, with an average of 73% of functions successfully validated for I/O equivalence, considerably higher than any existing work.

Paper Structure

This paper contains 41 sections, 5 equations, 16 figures, 2 tables, 3 algorithms.

Figures (16)

  • Figure 1: Source Go code consisting of three files: globals.go, types.go, and validator.go
  • Figure 2: Incorrect Rust translation for the Go code in \ref{['fig:go_overview']}
  • Figure 3: Correct Rust translation for the Go code in \ref{['fig:go_overview']}
  • Figure 4: Simplified Go specification
  • Figure 5: Simplified Rust specification
  • ...and 11 more figures

Theorems & Definitions (5)

  • Definition 1
  • Definition 2: Type-compatible function signatures
  • Definition 3: Type-compatible project translation
  • Definition 4: I/O equivalence of global variable initialization
  • Definition 5: I/O equivalence of functions