Scalable, Validated Code Translation of Entire Projects using Large Language Models

Hanliang Zhang; Cristina David; Meng Wang; Brandon Paulsen; Daniel Kroening

Scalable, Validated Code Translation of Entire Projects using Large Language Models

Hanliang Zhang, Cristina David, Meng Wang, Brandon Paulsen, Daniel Kroening

TL;DR

This work addresses the challenge of scalable, semantically correct code translation for entire projects using large language models. It introduces Oxidizer, a two-phase translation pipeline that first ensures type-compatibility and then validates I/O equivalence via execution-snapshot-guided testing, guided by feature mapping rules and robust type-compatibility checks. The method enables translating Go projects to Rust with high compilation success and an average of 73% of functions being I/O equivalent, outperforming prior approaches that relied on either purely symbolic rules or unvalidated LLM outputs. The approach is demonstrated on seven real Go projects, illustrating practical impact for large-scale, multi-language software maintenance and modernization. These results suggest significant reductions in developer effort for full-project translations and highlight the importance of structured guidance and semantic validation in LLM-driven code translation.

Abstract

Large language models (LLMs) show promise in code translation due to their ability to generate idiomatic code. However, a significant limitation when using LLMs for code translation is scalability: existing works have shown a drop in translation success rates for code exceeding around 100 lines. We overcome this limitation by developing a modular approach to translation, where we partition the code into small code fragments which can be translated independently and semantically validated (that is, checking I/O equivalence). When this approach is applied naively, we discover that LLMs are unreliable when translating features of the source language that do not have a direct mapping to the target language, and that the LLM often gets stuck in repair loops when attempting to fix errors. To address these issues, we introduce two key concepts: (1) feature mapping, which integrates predefined translation rules with LLM-based translation to guide the LLM in navigating subtle language differences and producing semantically accurate code; and (2) type-compatibility, which facilitates localized checks at the function signature level to detect errors early, thereby narrowing the scope of potential repairs. We apply our approach to translating real-world Go codebases to Rust, demonstrating that we can consistently generate reliable Rust translations for projects up to 6,600 lines of code and 369 functions, with an average of 73% of functions successfully validated for I/O equivalence, considerably higher than any existing work.

Scalable, Validated Code Translation of Entire Projects using Large Language Models

TL;DR

Abstract

Scalable, Validated Code Translation of Entire Projects using Large Language Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (16)

Theorems & Definitions (5)