Scalable, Validated Code Translation of Entire Projects using Large Language Models
Hanliang Zhang, Cristina David, Meng Wang, Brandon Paulsen, Daniel Kroening
TL;DR
This work addresses the challenge of scalable, semantically correct code translation for entire projects using large language models. It introduces Oxidizer, a two-phase translation pipeline that first ensures type-compatibility and then validates I/O equivalence via execution-snapshot-guided testing, guided by feature mapping rules and robust type-compatibility checks. The method enables translating Go projects to Rust with high compilation success and an average of 73% of functions being I/O equivalent, outperforming prior approaches that relied on either purely symbolic rules or unvalidated LLM outputs. The approach is demonstrated on seven real Go projects, illustrating practical impact for large-scale, multi-language software maintenance and modernization. These results suggest significant reductions in developer effort for full-project translations and highlight the importance of structured guidance and semantic validation in LLM-driven code translation.
Abstract
Large language models (LLMs) show promise in code translation due to their ability to generate idiomatic code. However, a significant limitation when using LLMs for code translation is scalability: existing works have shown a drop in translation success rates for code exceeding around 100 lines. We overcome this limitation by developing a modular approach to translation, where we partition the code into small code fragments which can be translated independently and semantically validated (that is, checking I/O equivalence). When this approach is applied naively, we discover that LLMs are unreliable when translating features of the source language that do not have a direct mapping to the target language, and that the LLM often gets stuck in repair loops when attempting to fix errors. To address these issues, we introduce two key concepts: (1) feature mapping, which integrates predefined translation rules with LLM-based translation to guide the LLM in navigating subtle language differences and producing semantically accurate code; and (2) type-compatibility, which facilitates localized checks at the function signature level to detect errors early, thereby narrowing the scope of potential repairs. We apply our approach to translating real-world Go codebases to Rust, demonstrating that we can consistently generate reliable Rust translations for projects up to 6,600 lines of code and 369 functions, with an average of 73% of functions successfully validated for I/O equivalence, considerably higher than any existing work.
