Table of Contents
Fetching ...

His2Trans: A Skeleton First Framework for Self Evolving C to Rust Translation with Historical Retrieval

Shengbo Wang, Mingwei Liu, Guangsheng Ou, Yuwen Chen, Zike Li, Yanlin Wang, Zibin Zheng

TL;DR

His2Trans is a framework that combines a deterministic, build-aware skeleton with self-evolving knowledge extraction to support stable, incremental migration and cuts repair overhead on unseen tasks by about 60%.

Abstract

Automated C-to-Rust migration encounters systemic obstacles when scaling from code snippets to industrial projects, mainly because build context is often unavailable ("dependency hell") and domain-specific evolutionary knowledge is missing. As a result, current LLM-based methods frequently cannot reconstruct precise type definitions under complex build systems or infer idiomatic API correspondences, which in turn leads to hallucinated dependencies and unproductive repair loops. To tackle these issues, we introduce His2Trans, a framework that combines a deterministic, build-aware skeleton with self-evolving knowledge extraction to support stable, incremental migration. On the structural side, His2Trans performs build tracing to create a compilable Project-Level Skeleton Graph, providing a strictly typed environment that separates global verification from local logic generation. On the cognitive side, it derives fine-grained API and code-fragment rules from historical migration traces and uses a Retrieval-Augmented Generation (RAG) system to steer the LLM toward idiomatic interface reuse. Experiments on industrial OpenHarmony modules show that His2Trans reaches a 99.75% incremental compilation pass rate, effectively fixing build failures where baselines struggle. On general-purpose benchmarks, it lowers the unsafe code ratio by 23.6 percentage points compared to C2Rust while producing the fewest warnings. Finally, knowledge accumulation studies demonstrate the framework's evolutionary behavior: by continuously integrating verified patterns, His2Trans cuts repair overhead on unseen tasks by about 60%.

His2Trans: A Skeleton First Framework for Self Evolving C to Rust Translation with Historical Retrieval

TL;DR

His2Trans is a framework that combines a deterministic, build-aware skeleton with self-evolving knowledge extraction to support stable, incremental migration and cuts repair overhead on unseen tasks by about 60%.

Abstract

Automated C-to-Rust migration encounters systemic obstacles when scaling from code snippets to industrial projects, mainly because build context is often unavailable ("dependency hell") and domain-specific evolutionary knowledge is missing. As a result, current LLM-based methods frequently cannot reconstruct precise type definitions under complex build systems or infer idiomatic API correspondences, which in turn leads to hallucinated dependencies and unproductive repair loops. To tackle these issues, we introduce His2Trans, a framework that combines a deterministic, build-aware skeleton with self-evolving knowledge extraction to support stable, incremental migration. On the structural side, His2Trans performs build tracing to create a compilable Project-Level Skeleton Graph, providing a strictly typed environment that separates global verification from local logic generation. On the cognitive side, it derives fine-grained API and code-fragment rules from historical migration traces and uses a Retrieval-Augmented Generation (RAG) system to steer the LLM toward idiomatic interface reuse. Experiments on industrial OpenHarmony modules show that His2Trans reaches a 99.75% incremental compilation pass rate, effectively fixing build failures where baselines struggle. On general-purpose benchmarks, it lowers the unsafe code ratio by 23.6 percentage points compared to C2Rust while producing the fewest warnings. Finally, knowledge accumulation studies demonstrate the framework's evolutionary behavior: by continuously integrating verified patterns, His2Trans cuts repair overhead on unseen tasks by about 60%.
Paper Structure (41 sections, 9 figures, 7 tables)

This paper contains 41 sections, 9 figures, 7 tables.

Figures (9)

  • Figure 1: Overview of the C-to-Rust translation framework.
  • Figure 2: Prompt templates utilized for knowledge mining (Left) and function generation (Right).
  • Figure 3: Workflow of Knowledge Base Construction. The pipeline executes a coarse-to-fine mining process: File-Level Pairing $\to$ Function-Level Re-ranking $\to$ Rule Extraction (API & Fragment).
  • Figure 4: Workflow of Incremental Function Translation. The Topological Scheduler dispatches parallel tasks based on the strictly typed Shared Layer and Module Skeleton, followed by a hybrid Rule/LLM-Based repair loop.
  • Figure 5: Detailed performance metrics for domain-specific projects (RQ1). Darker shades denote better results; "--" indicates compilation failure.
  • ...and 4 more figures