Table of Contents
Fetching ...

APIRAT: Integrating Multi-source API Knowledge for Enhanced Code Translation with LLMs

Chaofan Wang, Guanjie Qiu, Xiaodong Gu, Beijun Shen

TL;DR

This paper addresses the problem of cross-language API mistranslation in code translation performed by LLMs. It introduces ApiRAT, a retrieval-augmented framework that integrates multi-source API knowledge via API sequence retrieval, API sequence back-translation, and API mapping to guide translations. Empirical results on CodeNet and AVATAR show that ApiRAT yields substantial improvements in Computational Accuracy (4%–15.1%) and generalizes across different LLM backbones, with ablation confirming the value of each knowledge component. The work demonstrates the practical impact of incorporating structured API knowledge into prompts and retrieval systems to enhance cross-language software migration and development workflows.

Abstract

Code translation is an essential task in software migration, multilingual development, and system refactoring. Recent advancements in large language models (LLMs) have demonstrated significant potential in this task. However, prior studies have highlighted that LLMs often struggle with domain-specific code, particularly in resolving cross-lingual API mappings. To tackle this challenge, we propose APIRAT, a novel code translation method that integrates multi-source API knowledge. APIRAT employs three API knowledge augmentation techniques, including API sequence retrieval, API sequence back-translation, and API mapping, to guide LLMs to translating code, ensuring both the correct structure of API sequences and the accurate usage of individual APIs. Extensive experiments on two public datasets, CodeNet and AVATAR, indicate that APIRAT significantly surpasses existing LLM-based methods, achieving improvements in computational accuracy ranging from 4% to 15.1%. Additionally, our evaluation across different LLMs showcases the generalizability of APIRAT. An ablation study further confirms the individual contributions of each API knowledge component, underscoring the effectiveness of our approach.

APIRAT: Integrating Multi-source API Knowledge for Enhanced Code Translation with LLMs

TL;DR

This paper addresses the problem of cross-language API mistranslation in code translation performed by LLMs. It introduces ApiRAT, a retrieval-augmented framework that integrates multi-source API knowledge via API sequence retrieval, API sequence back-translation, and API mapping to guide translations. Empirical results on CodeNet and AVATAR show that ApiRAT yields substantial improvements in Computational Accuracy (4%–15.1%) and generalizes across different LLM backbones, with ablation confirming the value of each knowledge component. The work demonstrates the practical impact of incorporating structured API knowledge into prompts and retrieval systems to enhance cross-language software migration and development workflows.

Abstract

Code translation is an essential task in software migration, multilingual development, and system refactoring. Recent advancements in large language models (LLMs) have demonstrated significant potential in this task. However, prior studies have highlighted that LLMs often struggle with domain-specific code, particularly in resolving cross-lingual API mappings. To tackle this challenge, we propose APIRAT, a novel code translation method that integrates multi-source API knowledge. APIRAT employs three API knowledge augmentation techniques, including API sequence retrieval, API sequence back-translation, and API mapping, to guide LLMs to translating code, ensuring both the correct structure of API sequences and the accurate usage of individual APIs. Extensive experiments on two public datasets, CodeNet and AVATAR, indicate that APIRAT significantly surpasses existing LLM-based methods, achieving improvements in computational accuracy ranging from 4% to 15.1%. Additionally, our evaluation across different LLMs showcases the generalizability of APIRAT. An ablation study further confirms the individual contributions of each API knowledge component, underscoring the effectiveness of our approach.

Paper Structure

This paper contains 25 sections, 4 figures, 6 tables.

Figures (4)

  • Figure 1: An Example of Code Translation with LLMs
  • Figure 2: API Fault Patterns in Code Translation
  • Figure 3: Overview of ApiRAT
  • Figure 4: Performance with Different Parameter Settings