Unraveling the Potential of Large Language Models in Code Translation: How Far Are We?

Qingxiao Tao; Tingrui Yu; Xiaodong Gu; Beijun Shen

Unraveling the Potential of Large Language Models in Code Translation: How Far Are We?

Qingxiao Tao, Tingrui Yu, Xiaodong Gu, Beijun Shen

TL;DR

A large-scale empirical study to exploit the capabilities and incapabilities of LLMs in code translation tasks and proposes two methods: intermediary translation which selects an intermediary language between the source and target ones and self-training which fine-tunes LLMs on self-generated parallel data.

Abstract

While large language models (LLMs) exhibit state-of-the-art performance in various tasks, recent studies have revealed their struggle for code translation. This is because they haven't been extensively pre-trained with parallel multilingual code, which code translation heavily depends on. Moreover, existing benchmarks only cover a limited subset of common programming languages, and thus cannot reflect the full potential of LLMs in code translation. In this paper, we conduct a large-scale empirical study to exploit the capabilities and incapabilities of LLMs in code translation tasks. We first craft a novel benchmark called PolyHumanEval by extending HumanEval to a multilingual benchmark of 14 languages. With PolyHumanEval, we then perform over 110,000 translations with bleeding-edge code LLMs. The result shows LLMs' suboptimal performance on Python to other languages and the negligible impact of widely adopted LLM optimization techniques such as conventional pre-training and instruction tuning on code translation. To further uncover the potential of LLMs in code translation, we propose two methods: (1) intermediary translation which selects an intermediary language between the source and target ones; and (2) self-training which fine-tunes LLMs on self-generated parallel data. Evaluated with CodeLlama-13B, our approach yields an average improvement of 11.7% computation accuracy on Python-to-other translations. Notably, we interestingly find that Go can serve as a lingua franca for translating between any two studied languages.

Unraveling the Potential of Large Language Models in Code Translation: How Far Are We?

TL;DR

Abstract

Unraveling the Potential of Large Language Models in Code Translation: How Far Are We?

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (4)