Syzygy: Dual Code-Test C to (safe) Rust Translation using LLMs and Dynamic Analysis

Manish Shetty; Naman Jain; Adwait Godbole; Sanjit A. Seshia; Koushik Sen

Syzygy: Dual Code-Test C to (safe) Rust Translation using LLMs and Dynamic Analysis

Manish Shetty, Naman Jain, Adwait Godbole, Sanjit A. Seshia, Koushik Sen

TL;DR

Syzygy introduces a dual code-test translation pipeline that converts medium-to-large C codebases to safe Rust by combining LLM-driven code generation with dynamic specification mining and test-based equivalence checks. The approach translates code incrementally along the program dependency graph, guided by dynamic analysis to recover types, aliasing, and allocation sizes, while generating intermediate tests to validate each translation unit. Empirical evaluation on Zopfli and UrlParser demonstrates scalable translation and high equivalence, albeit with some runtime overhead and the need for manual repair in complex macros. The work provides a practical, test-driven path toward memory-safety guarantees in Rust for legacy C code, and outlines future directions for improving efficiency, handling more complex C constructs, and refining safety criteria.

Abstract

Despite extensive usage in high-performance, low-level systems programming applications, C is susceptible to vulnerabilities due to manual memory management and unsafe pointer operations. Rust, a modern systems programming language, offers a compelling alternative. Its unique ownership model and type system ensure memory safety without sacrificing performance. In this paper, we present Syzygy, an automated approach to translate C to safe Rust. Our technique uses a synergistic combination of LLM-driven code and test translation guided by dynamic-analysis-generated execution information. This paired translation runs incrementally in a loop over the program in dependency order of the code elements while maintaining per-step correctness. Our approach exposes novel insights on combining the strengths of LLMs and dynamic analysis in the context of scaling and combining code generation with testing. We apply our approach to successfully translate Zopfli, a high-performance compression library with ~3000 lines of code and 98 functions. We validate the translation by testing equivalence with the source C program on a set of inputs. To our knowledge, this is the largest automated and test-validated C to safe Rust code translation achieved so far.

Syzygy: Dual Code-Test C to (safe) Rust Translation using LLMs and Dynamic Analysis

TL;DR

Abstract

Syzygy: Dual Code-Test C to (safe) Rust Translation using LLMs and Dynamic Analysis

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (16)