Project-Level C-to-Rust Translation via Synergistic Integration of Knowledge Graphs and Large Language Models
Zhiqiang Yuan, Wenjun Mao, Zhuo Chen, Xiyue Shang, Chong Wang, Yiling Lou, Xin Peng
TL;DR
The paper tackles automatic, project-level translation of C code to safe, idiomatic Rust by addressing pointer-wide semantics that hinder prior bottom-up approaches. It introduces the C-Rust Pointer Knowledge Graph (KG) and PtrMapper, a workflow that uses the KG to guide LLM-generated translations with incremental compilation and error-corrective feedback. The KG encodes code-dependency relations, global pointer-usage behavior, and Rust-oriented ownership/mutability/lifetime annotations, enabling more accurate and safer Rust code across large projects. Evaluation on 16 Crown-set C projects demonstrates substantial gains in Rust idiomaticity and safety (e.g., dramatic reductions in lint and unsafe code) and improved functional correctness compared to baselines, with ablation showing the critical value of KG-guided semantics and error correction. Overall, PtrMapper offers a scalable, semantics-aware pathway for automated C-to-Rust translation that can significantly reduce manual effort while producing safer, more maintainable Rust code.
Abstract
Translating C code into safe Rust is an effective way to ensure its memory safety. Compared to rule-based translation which produces Rust code that remains largely unsafe, LLM-based methods can generate more idiomatic and safer Rust code because LLMs have been trained on vast amount of human-written idiomatic code. Although promising, existing LLM-based methods still struggle with project-level C-to-Rust translation. They typically partition a C project into smaller units (\eg{} functions) based on call graphs and translate them bottom-up to resolve program dependencies. However, this bottom-up, unit-by-unit paradigm often fails to translate pointers due to the lack of a global perspective on their usage. To address this problem, we propose a novel C-Rust Pointer Knowledge Graph (KG) that enriches a code-dependency graph with two types of pointer semantics: (i) pointer-usage information which record global behaviors such as points-to flows and map lower-level struct usage to higher-level units; and (ii) Rust-oriented annotations which encode ownership, mutability, nullability, and lifetime. Synthesizing the \kg{} with LLMs, we further propose \ourtool{}, which implements a project-level C-to-Rust translation technique. In \ourtool{}, the \kg{} provides LLMs with comprehensive pointer semantics from a global perspective, thus guiding LLMs towards generating safe and idiomatic Rust code from a given C project. Our experiments show that \ourtool{} reduces unsafe usages in translated Rust by 99.9\% compared to both rule-based translation and traditional LLM-based rewriting, while achieving an average 29.3\% higher functional correctness than those fuzzing-enhanced LLM methods.
