Table of Contents
Fetching ...

SmartC2Rust: Iterative, Feedback-Driven C-to-Rust Translation via Large Language Models for Safety and Equivalence

Momoko Shiraishi, Yinzhi Cao, Takahiro Shinagawa

TL;DR

SmartC2Rust tackles memory-safety and semantic-equivalence challenges in translating legacy C code to Rust. It introduces an iterative, feedback-driven pipeline that uses compilation and runtime tests, as well as macro-preservation metadata, to refine translations across modular units under limited context windows. In extensive evaluation across 21 programs, it dramatically reduces unsafe constructs by 99.4% and achieves full semantic equivalence with high macro preservation, outperforming C2Rust, Crown, Laertes, and C2SaferRust. The approach demonstrates practical viability for migrating safety-critical C codebases to Rust and provides a scalable blueprint for LLM-assisted translation with repair loops.

Abstract

Memory safety vulnerabilities remain prevalent in today's software systems and one promising solution to mitigate them is to adopt memory-safe languages such as Rust. Due to legacy code written in memory unsafe C, there is strong motivation to translate legacy C code into Rust. Prior works have already shown promise in using Large Language Models (LLMs) for such translations. However, significant challenges persist for LLM-based translation: the translated code often fails to compile, let alone reduce unsafe statements and maintain the semantic functionalities due to inherent limitations of LLMs such as limited token size and inconsistent outputs. In this paper, we design an automated C-to-Rust translation system, called SmartC2Rust, to segment and convert the C code to Rust with memory safety and semantic equivalence. The key insight is to iteratively refine the output Rust code with additional feedback, e.g., compilation errors, segmentation contexts, semantic discrepancies, and memory unsafe statements. Such feedback will gradually improve the quality of generated Rust code, thus mitigating unsafety, inconsistency, and semantic issues. Our evaluation shows that SmartC2Rust significantly decreases the unsafe statements and outperforms prior works in security and semantic equivalence.

SmartC2Rust: Iterative, Feedback-Driven C-to-Rust Translation via Large Language Models for Safety and Equivalence

TL;DR

SmartC2Rust tackles memory-safety and semantic-equivalence challenges in translating legacy C code to Rust. It introduces an iterative, feedback-driven pipeline that uses compilation and runtime tests, as well as macro-preservation metadata, to refine translations across modular units under limited context windows. In extensive evaluation across 21 programs, it dramatically reduces unsafe constructs by 99.4% and achieves full semantic equivalence with high macro preservation, outperforming C2Rust, Crown, Laertes, and C2SaferRust. The approach demonstrates practical viability for migrating safety-critical C codebases to Rust and provides a scalable blueprint for LLM-assisted translation with repair loops.

Abstract

Memory safety vulnerabilities remain prevalent in today's software systems and one promising solution to mitigate them is to adopt memory-safe languages such as Rust. Due to legacy code written in memory unsafe C, there is strong motivation to translate legacy C code into Rust. Prior works have already shown promise in using Large Language Models (LLMs) for such translations. However, significant challenges persist for LLM-based translation: the translated code often fails to compile, let alone reduce unsafe statements and maintain the semantic functionalities due to inherent limitations of LLMs such as limited token size and inconsistent outputs. In this paper, we design an automated C-to-Rust translation system, called SmartC2Rust, to segment and convert the C code to Rust with memory safety and semantic equivalence. The key insight is to iteratively refine the output Rust code with additional feedback, e.g., compilation errors, segmentation contexts, semantic discrepancies, and memory unsafe statements. Such feedback will gradually improve the quality of generated Rust code, thus mitigating unsafety, inconsistency, and semantic issues. Our evaluation shows that SmartC2Rust significantly decreases the unsafe statements and outperforms prior works in security and semantic equivalence.
Paper Structure (30 sections, 3 figures, 8 tables, 1 algorithm)

This paper contains 30 sections, 3 figures, 8 tables, 1 algorithm.

Figures (3)

  • Figure 1: Overview of SmartC2Rust.
  • Figure 2: An Example Translation Prompt
  • Figure 3: An example of function execution flow.