Table of Contents
Fetching ...

Translating C To Rust: Lessons from a User Study

Ruishi Li, Bo Wang, Tianyu Li, Prateek Saxena, Ashish Kundu

TL;DR

This paper analyzes how hard it is to translate real-world C programs to Rust by conducting a user study with non-expert Rust users, showing that humans can produce memory-safe Rust translations with minimal overhead while current automatic C-to-Rust tools fall short. It reveals that users semantically lift C code to safe Rust, often employing zero-cost abstractions and static safety, and that multiple translation strategies exist for the same program. However, functional correctness gaps persist, as fuzzing exposes non-equivalence with the original C and reveals that automation still struggles with decomposition and integration. The findings identify concrete strategies for automatic translation, including data-type lifting, aliasing handling (elision/cloning), and struct-based global refactoring, and call for improved modeling of Rust data types, incremental decomposition, and robust last-mile verification. Overall, the work provides a nuanced view of human-guided translation as a valuable complement to automated approaches and offers actionable directions for developing more capable C-to-Rust translators.

Abstract

Rust aims to offer full memory safety for programs, a guarantee that untamed C programs do not enjoy. How difficult is it to translate existing C code to Rust? To get a complementary view from that of automatic C to Rust translators, we report on a user study asking humans to translate real-world C programs to Rust. Our participants are able to produce safe Rust translations, whereas state-of-the-art automatic tools are not able to do so. Our analysis highlights that the high-level strategy taken by users departs significantly from those of automatic tools we study. We also find that users often choose zero-cost (static) abstractions for temporal safety, which addresses a predominant component of runtime costs in other full memory safety defenses. User-provided translations showcase a rich landscape of specialized strategies to translate the same C program in different ways to safe Rust, which future automatic translators can consider.

Translating C To Rust: Lessons from a User Study

TL;DR

This paper analyzes how hard it is to translate real-world C programs to Rust by conducting a user study with non-expert Rust users, showing that humans can produce memory-safe Rust translations with minimal overhead while current automatic C-to-Rust tools fall short. It reveals that users semantically lift C code to safe Rust, often employing zero-cost abstractions and static safety, and that multiple translation strategies exist for the same program. However, functional correctness gaps persist, as fuzzing exposes non-equivalence with the original C and reveals that automation still struggles with decomposition and integration. The findings identify concrete strategies for automatic translation, including data-type lifting, aliasing handling (elision/cloning), and struct-based global refactoring, and call for improved modeling of Rust data types, incremental decomposition, and robust last-mile verification. Overall, the work provides a nuanced view of human-guided translation as a valuable complement to automated approaches and offers actionable directions for developing more capable C-to-Rust translators.

Abstract

Rust aims to offer full memory safety for programs, a guarantee that untamed C programs do not enjoy. How difficult is it to translate existing C code to Rust? To get a complementary view from that of automatic C to Rust translators, we report on a user study asking humans to translate real-world C programs to Rust. Our participants are able to produce safe Rust translations, whereas state-of-the-art automatic tools are not able to do so. Our analysis highlights that the high-level strategy taken by users departs significantly from those of automatic tools we study. We also find that users often choose zero-cost (static) abstractions for temporal safety, which addresses a predominant component of runtime costs in other full memory safety defenses. User-provided translations showcase a rich landscape of specialized strategies to translate the same C program in different ways to safe Rust, which future automatic translators can consider.

Paper Structure

This paper contains 24 sections, 22 figures, 4 tables.

Figures (22)

  • Figure 1: A Rust code example showing the concepts of ownership transfer, mut./immut. borrowing, and reference lifetime.
  • Figure 2: (Top) An example of C program with aliasing pointers, i.e., multiple pointers pointing to the same region of memory. (Bottom) A line-by-line translation by c2rust. Uses of unsafe raw pointers and APIs/types are highlighted in red.
  • Figure 3: A safe Rust translation of the C program (Version A)
  • Figure 4: Raw C pointers to Rust data types lifted in translations and the number of programs using each Rust type (in blue).
  • Figure 5: Breakdown of C library API translations with examples.
  • ...and 17 more figures