Table of Contents
Fetching ...

BabelCoder: Agentic Code Translation with Specification Alignment

Fazle Rabbi, Soumit Kanti Saha, Tri Minh Triet Pham, Song Wang, Jinqiu Yang

TL;DR

BabelCoder introduces a novel multi-agent framework for cross-language code translation that splits translation into specialized Translation, Test, and Refinement agents. It leverages NL-Specifications to semantically guide translation, combines SBFL and LLM-based scope estimation for targeted bug localization, and employs test-driven refinement to improve executability and robustness. Evaluated on Avatar, CodeNet, EvalPlus, and TransCoder across five languages, it achieves an average computational accuracy of 94.16%, outperforming four strong baselines in most cases. Ablation studies demonstrate the additive value of NL-Specification augmentation/validation and bug-scope estimation, underscoring the effectiveness of integrating specification-guided reasoning with iterative repair. The work suggests promising directions for scaling to repository-level translation and further enhancing semantic fidelity through richer control-flow information.

Abstract

As software systems evolve, developers increasingly work across multiple programming languages and often face the need to migrate code from one language to another. While automatic code translation offers a promising solution, it has long remained a challenging task. Recent advancements in Large Language Models (LLMs) have shown potential for this task, yet existing approaches remain limited in accuracy and fail to effectively leverage contextual and structural cues within the code. Prior work has explored translation and repair mechanisms, but lacks a structured, agentic framework where multiple specialized agents collaboratively improve translation quality. In this work, we introduce BabelCoder, an agentic framework that performs code translation by decomposing the task into specialized agents for translation, testing, and refinement, each responsible for a specific aspect such as generating code, validating correctness, or repairing errors. We evaluate BabelCoder on four benchmark datasets and compare it against four state-of-the-art baselines. BabelCoder outperforms existing methods by 0.5%-13.5% in 94% of cases, achieving an average accuracy of 94.16%.

BabelCoder: Agentic Code Translation with Specification Alignment

TL;DR

BabelCoder introduces a novel multi-agent framework for cross-language code translation that splits translation into specialized Translation, Test, and Refinement agents. It leverages NL-Specifications to semantically guide translation, combines SBFL and LLM-based scope estimation for targeted bug localization, and employs test-driven refinement to improve executability and robustness. Evaluated on Avatar, CodeNet, EvalPlus, and TransCoder across five languages, it achieves an average computational accuracy of 94.16%, outperforming four strong baselines in most cases. Ablation studies demonstrate the additive value of NL-Specification augmentation/validation and bug-scope estimation, underscoring the effectiveness of integrating specification-guided reasoning with iterative repair. The work suggests promising directions for scaling to repository-level translation and further enhancing semantic fidelity through richer control-flow information.

Abstract

As software systems evolve, developers increasingly work across multiple programming languages and often face the need to migrate code from one language to another. While automatic code translation offers a promising solution, it has long remained a challenging task. Recent advancements in Large Language Models (LLMs) have shown potential for this task, yet existing approaches remain limited in accuracy and fail to effectively leverage contextual and structural cues within the code. Prior work has explored translation and repair mechanisms, but lacks a structured, agentic framework where multiple specialized agents collaboratively improve translation quality. In this work, we introduce BabelCoder, an agentic framework that performs code translation by decomposing the task into specialized agents for translation, testing, and refinement, each responsible for a specific aspect such as generating code, validating correctness, or repairing errors. We evaluate BabelCoder on four benchmark datasets and compare it against four state-of-the-art baselines. BabelCoder outperforms existing methods by 0.5%-13.5% in 94% of cases, achieving an average accuracy of 94.16%.

Paper Structure

This paper contains 29 sections, 8 figures, 4 tables.

Figures (8)

  • Figure 1: An example from the EvalPlus benchmark liu2024your, where prior methods yang2024exploringpan2024lost fail to correctly translate a Python max_element function to Java due to misinterpreting its logic. BabelCoder's Refinement Agent (Section \ref{['refinement-agent']}) generates validated NL-Specification that guides accurate translation and bug fixing (Section \ref{['approach']}), resulting in a correct and robust output.
  • Figure 2: Overview of BabelCoder
  • Figure 3: Prompt template used by Translation Agent to translate code. The {source_code} can be either the code in the source language or NL-Specification.
  • Figure 4: Prompt template for generating NL-Specification
  • Figure 5: Prompt template for aligning NL-Specification with translated code and refine the NL-Specification using LLM.
  • ...and 3 more figures