Rectifier: Code Translation with Corrector via LLMs

Xin Yin; Chao Ni; Tien N. Nguyen; Shaohua Wang; Xiaohu Yang

Rectifier: Code Translation with Corrector via LLMs

Xin Yin, Chao Ni, Tien N. Nguyen, Shaohua Wang, Xiaohu Yang

TL;DR

This work investigates the limitations of large language models for code translation and introduces Rectifier, a compact micro-model fine-tuned on error patterns to serve as a universal, LLM-agnostic corrector. By collecting error-translating pairs from multiple LLMs on CodeNet and AVATAR datasets, the authors train CodeT5+ to repair common translation mistakes and generalize to unseen LLMs. Empirical results show LLMs produce substantial compilation and functional errors, while Rectifier can fix a meaningful subset of these errors across languages and datasets, demonstrating cross-LLM robustness. The study also provides detailed error taxonomy, case studies, and threat analyses, offering a practical path toward reliable, automated code migration tooling.

Abstract

Software migration is garnering increasing attention with the evolution of software and society. Early studies mainly relied on handcrafted translation rules to translate between two languages, the translation process is error-prone and time-consuming. In recent years, researchers have begun to explore the use of pre-trained large language models (LLMs) in code translation. However, code translation is a complex task that LLMs would generate mistakes during code translation, they all produce certain types of errors when performing code translation tasks, which include (1) compilation error, (2) runtime error, (3) functional error, and (4) non-terminating execution. We found that the root causes of these errors are very similar (e.g. failure to import packages, errors in loop boundaries, operator errors, and more). In this paper, we propose a general corrector, namely Rectifier, which is a micro and universal model for repairing translation errors. It learns from errors generated by existing LLMs and can be widely applied to correct errors generated by any LLM. The experimental results on translation tasks between C++, Java, and Python show that our model has effective repair ability, and cross experiments also demonstrate the robustness of our method.

Rectifier: Code Translation with Corrector via LLMs

TL;DR

Abstract

Paper Structure (37 sections, 13 figures, 6 tables)

This paper contains 37 sections, 13 figures, 6 tables.

Introduction
Motivation Example
A Motivation Example
Key Ideas
Rectifier: Code Translation with Corrector via LLMs
Collection Phase
Task 1: Prompt Preparation
Task 2: Translation Collection
Task 3: Mistake Correction
Fine-Tune and Inference Phase
Experimental Methodology
Dataset Collection and Pre-Processing
Studied Baseline Models
Experimental Procedure
Empirical Results
...and 22 more sections

Figures (13)

Figure 1: Translate the Java code "atcoder_ABC169_D" in the AVATAR dataset into Python code
Figure 2: Prompt for ChatGPT and other models
Figure 3: Fine-tune a smaller model to be the general corrector
Figure 4: Proportion of translation results for each LLM
Figure 5: An example of the syntactic difference between languages
...and 8 more figures

Rectifier: Code Translation with Corrector via LLMs

TL;DR

Abstract

Rectifier: Code Translation with Corrector via LLMs

Authors

TL;DR

Abstract

Table of Contents

Figures (13)