LLMigrate: Transforming "Lazy" Large Language Models into Efficient Source Code Migrators

Yuchen Liu; Junhao Hu; Yingdi Shan; Ge Li; Yanzhen Zou; Yihong Dong; Tao Xie

LLMigrate: Transforming "Lazy" Large Language Models into Efficient Source Code Migrators

Yuchen Liu, Junhao Hu, Yingdi Shan, Ge Li, Yanzhen Zou, Yihong Dong, Tao Xie

TL;DR

LLMigration introduces LLMigrate, a function-level C-to-Rust translation framework that mitigates LLM laziness by decomposing large modules into small functions, translating each independently, and reintegrating them under call-graph guidance. The system combines function splitting, context probing, and a repair loop with rule-based support to produce safe, compilable Rust and minimizes human edits to under 15% of final code across Linux kernel modules. It demonstrates that function-level translation improves correctness and safety relative to whole-module LLM translations, while the Repair component further increases compilation success. The work highlights a pragmatic hybrid approach that leverages LLMs for idiomatic code generation, complemented by static analysis and program repair to achieve scalable, safe system migrations with real-world impact for large-scale codebases.

Abstract

Rewriting C code in Rust provides stronger memory safety, yet migrating large codebases such as the 32-million-line Linux kernel remains challenging. While rule-based translators (e.g., C2Rust) provide accurate yet largely unsafe Rust programs, recent Large Language Model (LLM) approaches produce more idiomatic, safe Rust programs but frequently exhibit "laziness", omitting significant portions of the target code. To address the issue, in this paper, we present LLMigrate, an LLM-based C-to-Rust translation tool that splits modules into discrete functions, translating them individually, and then reintegrating them. LLMigrate uses static analysis to retain necessary context, pairs GPT-4o (a state-of-the-art LLM) with compiler-driven translation and program-repair techniques for complex core functions, and leverages call-graph-guided translation to ensure consistent interfaces. Evaluations on three representative Linux kernel modules (math, sort, and ramfs) show that LLMigrate requires modifying less than 15\% of the target code, significantly outperforming a pure GPT-4o-based migration.

LLMigrate: Transforming "Lazy" Large Language Models into Efficient Source Code Migrators

TL;DR

Abstract

LLMigrate: Transforming "Lazy" Large Language Models into Efficient Source Code Migrators

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (14)