APIRAT: Integrating Multi-source API Knowledge for Enhanced Code Translation with LLMs
Chaofan Wang, Guanjie Qiu, Xiaodong Gu, Beijun Shen
TL;DR
This paper addresses the problem of cross-language API mistranslation in code translation performed by LLMs. It introduces ApiRAT, a retrieval-augmented framework that integrates multi-source API knowledge via API sequence retrieval, API sequence back-translation, and API mapping to guide translations. Empirical results on CodeNet and AVATAR show that ApiRAT yields substantial improvements in Computational Accuracy (4%–15.1%) and generalizes across different LLM backbones, with ablation confirming the value of each knowledge component. The work demonstrates the practical impact of incorporating structured API knowledge into prompts and retrieval systems to enhance cross-language software migration and development workflows.
Abstract
Code translation is an essential task in software migration, multilingual development, and system refactoring. Recent advancements in large language models (LLMs) have demonstrated significant potential in this task. However, prior studies have highlighted that LLMs often struggle with domain-specific code, particularly in resolving cross-lingual API mappings. To tackle this challenge, we propose APIRAT, a novel code translation method that integrates multi-source API knowledge. APIRAT employs three API knowledge augmentation techniques, including API sequence retrieval, API sequence back-translation, and API mapping, to guide LLMs to translating code, ensuring both the correct structure of API sequences and the accurate usage of individual APIs. Extensive experiments on two public datasets, CodeNet and AVATAR, indicate that APIRAT significantly surpasses existing LLM-based methods, achieving improvements in computational accuracy ranging from 4% to 15.1%. Additionally, our evaluation across different LLMs showcases the generalizability of APIRAT. An ablation study further confirms the individual contributions of each API knowledge component, underscoring the effectiveness of our approach.
