Table of Contents
Fetching ...

Validated Code Translation for Projects with External Libraries

Hanliang Zhang, Arindam Sharma, Cristina David, Meng Wang, Brandon Paulsen, Daniel Kroening, Wenjia Ye, Taro Sekiyama

TL;DR

This paper tackles translating Go projects that rely on external libraries into Rust, addressing two core challenges: hallucination of external APIs by LLMs and the difficulty of validating semantic equivalence when library-defined types are opaque. It proposes a modular, retrieval-guided pipeline (CrossCrate) that first maps Go APIs to Rust crates using crate-scoped knowledge bases and import-resolution, then generates translations with API-informed prompts. For validation, the framework introduces a schema-backed Protobuf carrier ($T_{ ext{Proto}}$) to bridge Go and Rust, synthesizing adapters from public APIs and performing two-stage round-tripping before differential I/O testing on unit-test inputs. Experimental results on six real Go repositories with dependencies show substantial gains, achieving up to 100% compilation and equivalence and roughly a 2x average improvement over baselines, demonstrating practical viability for repository-scale translation with external libraries.

Abstract

Large Language Models (LLMs) have shown promise for program translation, particularly for migrating systems code to memory-safe languages such as Rust. However, existing approaches struggle when source programs depend on external libraries: LLMs frequently hallucinate non-existent target APIs and fail to generate call-enabling imports; moreover, validating semantic equivalence is challenging when the code manipulates opaque, library-defined types. We present a translation and validation framework for translating Go projects with external dependencies to Rust. Our approach combines (i) a retrieval mechanism that maps Go library APIs to Rust APIs, and (ii) a cross-language validation pipeline that establishes language interoperability in the presence of opaque library types by synthesising adapters exclusively from public library APIs, prior to validating I/O equivalence. We evaluate our system on six real-world Go repositories with non-trivial external dependencies. Our approach significantly increases both the compilation and equivalence success rate (up to 100% in the most dependency-heavy case; approx. 2x on average) by enabling validated translation that manipulate opaque, library-defined types.

Validated Code Translation for Projects with External Libraries

TL;DR

This paper tackles translating Go projects that rely on external libraries into Rust, addressing two core challenges: hallucination of external APIs by LLMs and the difficulty of validating semantic equivalence when library-defined types are opaque. It proposes a modular, retrieval-guided pipeline (CrossCrate) that first maps Go APIs to Rust crates using crate-scoped knowledge bases and import-resolution, then generates translations with API-informed prompts. For validation, the framework introduces a schema-backed Protobuf carrier () to bridge Go and Rust, synthesizing adapters from public APIs and performing two-stage round-tripping before differential I/O testing on unit-test inputs. Experimental results on six real Go repositories with dependencies show substantial gains, achieving up to 100% compilation and equivalence and roughly a 2x average improvement over baselines, demonstrating practical viability for repository-scale translation with external libraries.

Abstract

Large Language Models (LLMs) have shown promise for program translation, particularly for migrating systems code to memory-safe languages such as Rust. However, existing approaches struggle when source programs depend on external libraries: LLMs frequently hallucinate non-existent target APIs and fail to generate call-enabling imports; moreover, validating semantic equivalence is challenging when the code manipulates opaque, library-defined types. We present a translation and validation framework for translating Go projects with external dependencies to Rust. Our approach combines (i) a retrieval mechanism that maps Go library APIs to Rust APIs, and (ii) a cross-language validation pipeline that establishes language interoperability in the presence of opaque library types by synthesising adapters exclusively from public library APIs, prior to validating I/O equivalence. We evaluate our system on six real-world Go repositories with non-trivial external dependencies. Our approach significantly increases both the compilation and equivalence success rate (up to 100% in the most dependency-heavy case; approx. 2x on average) by enabling validated translation that manipulate opaque, library-defined types.
Paper Structure (52 sections, 2 theorems, 14 equations, 13 figures, 2 tables, 4 algorithms)

This paper contains 52 sections, 2 theorems, 14 equations, 13 figures, 2 tables, 4 algorithms.

Key Result

lemma thmcounterlemma

Assume the full round-trip property (Definition def:full-roundtrip) holds for the output type $O$ on all values in $\{f(v)\mid v\in\overline{i}\}$, and assume I/O equivalence holds for all $v\in\overline{i}$. Then for every $v\in\overline{i}$, i.e., on the observed inputs, $f$ produces the same outputs as the function $\mathcal{A}^{-}_{O}\circ g\circ \mathcal{A}_I$.

Figures (13)

  • Figure 1: Go function from ageage and its Rust translation
  • Figure 2: Go type RSAIdentity
  • Figure 3: Translated Rust type RsaIdentity
  • Figure 4: Definition of rsa.PrivateKey.
  • Figure 5: RAG-enhanced context in the translation prompts.
  • ...and 8 more figures

Theorems & Definitions (9)

  • definition thmcounterdefinition: Go-side round-trip property
  • definition thmcounterdefinition: Full round-trip property
  • definition thmcounterdefinition: I/O Equivalence
  • lemma thmcounterlemma: Go-level behavioural agreement on observed inputs
  • proof
  • definition thmcounterdefinition: Valid import path
  • definition thmcounterdefinition: Re-export resolution
  • lemma thmcounterlemma: Validity of generated import paths
  • proof : sketch