Evolving Triple Knowledge-Augmented LLMs for Code Translation in Repository Context
Guangsheng Ou, Mingwei Liu, Yuxuan Chen, Xueying Du, Shengbo Wang, Zekai Zhang, Xin Peng, Zibin Zheng
TL;DR
The paper tackles the challenge of repository-context code translation with LLMs by introducing K$^{\mathsf{3}}$Trans, a self-evolving framework that augments prompts with triple knowledge: dependency usage examples, target-language code samples, and successful translation function pairs. It constructs and continuously updates a multi-source knowledge base offline and retrieves the most relevant items online using BM25 and UniXcoder for re-ranking, followed by an LLM-based translation and self-debugging step. Empirical evaluation on the RustRepoTrans benchmark shows substantial gains over baselines across both execution-based and match-based metrics, with dependency usage examples delivering the largest impact and self-evolution providing ongoing improvements. The approach demonstrates practical potential for industrial software migration by improving accuracy, robustness, and adaptability to evolving repository contexts, while also identifying remaining challenges when target-language counterparts are missing in the translated repository.
Abstract
Large language models (LLMs) have behaved well in function-level code translation without repository-level context. However, the performance of LLMs in repository-level context code translation remains suboptimal due to complex dependencies and context, hindering their adoption in industrial settings. In this work, we propose a novel LLM-based code translation technique K-Trans, which leverages triple knowledge augmentation to enhance LLM's translation quality under repository context in real-world software development. First, K-Trans constructs a evolving translation knowledge base by extracting relevant information from target-language codebases, the repository being translated, and prior translation results. Second, for each function to be translated, K-Trans retrieves relevant triple knowledge, including target-language code samples, dependency usage examples, and successful translation function pairs, serving as references to enhance LLM for translation. Third, K-Trans constructs a knowledge-augmented translation prompt using the retrieved triple knowledge and employs LLMs to generate the translated code while preserving repository context. It further leverages LLMs for self-debugging, enhancing translation correctness. Lastly, K-Trans continuously evolves the translation knowledge base. The experiments show that K-Trans substantially outperforms the baseline adapted from previous work by 19.4%/40.2% relative improvement in pass@1 and 0.138 in CodeBLEU. It is important to note that the results also demonstrate that each knowledge significantly contributes to K-Trans's effectiveness in handling repository-level context code translation, with dependency usage examples making the most notable contribution. Moreover, as the self-evolution process progresses, the knowledge base continuously enhances the LLM's performance across various aspects of the repository-level code translation.
