Table of Contents
Fetching ...

LLM-Assisted Translation of Legacy FORTRAN Codes to C++: A Cross-Platform Study

Nishath Rajiv Ranasinghe, Shawn M. Jones, Michal Kucer, Ayan Biswas, Daniel O'Malley, Alexander Buschmann Most, Selma Liliane Wanna, Ajay Sreekumar

TL;DR

The paper investigates LLM-assisted translation of legacy Fortran HPC codes to C++ using open-weight models across two platforms (vLLM and SambaNova Cloud). It introduces a cross-platform evaluation workflow that quantifies translation quality via CodeBLEU against human ground truth, measures compilation success, and assesses output fidelity against the original Fortran programs. Findings show larger LLMs generally improve CodeBLEU similarity, compilation accuracy, and output similarity, though variability persists and platform-specific error modes are observed. The work demonstrates the viability of open-weight LLMs for Fortran-to-C++ translation within a reproducible framework, while highlighting the need for human-in-the-loop and further enhancements (data, prompting, and iterative feedback) for mission-critical scientific software. Practical impact lies in providing a standardized, open framework to evaluate and guide LLM-assisted translation workflows across HPC environments.

Abstract

Large Language Models (LLMs) are increasingly being leveraged for generating and translating scientific computer codes by both domain-experts and non-domain experts. Fortran has served as one of the go to programming languages in legacy high-performance computing (HPC) for scientific discoveries. Despite growing adoption, LLM-based code translation of legacy code-bases has not been thoroughly assessed or quantified for its usability. Here, we studied the applicability of LLM-based translation of Fortran to C++ as a step towards building an agentic-workflow using open-weight LLMs on two different computational platforms. We statistically quantified the compilation accuracy of the translated C++ codes, measured the similarity of the LLM translated code to the human translated C++ code, and statistically quantified the output similarity of the Fortran to C++ translation.

LLM-Assisted Translation of Legacy FORTRAN Codes to C++: A Cross-Platform Study

TL;DR

The paper investigates LLM-assisted translation of legacy Fortran HPC codes to C++ using open-weight models across two platforms (vLLM and SambaNova Cloud). It introduces a cross-platform evaluation workflow that quantifies translation quality via CodeBLEU against human ground truth, measures compilation success, and assesses output fidelity against the original Fortran programs. Findings show larger LLMs generally improve CodeBLEU similarity, compilation accuracy, and output similarity, though variability persists and platform-specific error modes are observed. The work demonstrates the viability of open-weight LLMs for Fortran-to-C++ translation within a reproducible framework, while highlighting the need for human-in-the-loop and further enhancements (data, prompting, and iterative feedback) for mission-critical scientific software. Practical impact lies in providing a standardized, open framework to evaluate and guide LLM-assisted translation workflows across HPC environments.

Abstract

Large Language Models (LLMs) are increasingly being leveraged for generating and translating scientific computer codes by both domain-experts and non-domain experts. Fortran has served as one of the go to programming languages in legacy high-performance computing (HPC) for scientific discoveries. Despite growing adoption, LLM-based code translation of legacy code-bases has not been thoroughly assessed or quantified for its usability. Here, we studied the applicability of LLM-based translation of Fortran to C++ as a step towards building an agentic-workflow using open-weight LLMs on two different computational platforms. We statistically quantified the compilation accuracy of the translated C++ codes, measured the similarity of the LLM translated code to the human translated C++ code, and statistically quantified the output similarity of the Fortran to C++ translation.

Paper Structure

This paper contains 17 sections, 6 figures, 2 tables.

Figures (6)

  • Figure 1: Regardless of LLM, our workflow evaluates several parts of the LLM's code translation, starting by comparing it to a human-translated ground truth with CodeBLEU, then moving to evaluate how well the translation compiles and executes. Finally, the workflow compares the output between the original Fortran code and the translated code's C++ executable.
  • Figure 2: The prompt used in this study.
  • Figure 3: Kernel density estimate plots demonstrating the distribution of total bias (1 - CodeBLEU Score) for each Fortran translation demonstrates different distributions per execution platform.
  • Figure 5: Each Fortran code is plotted along the x-axis while the count of tries for a corresponding C++ translation is placed on the y-axis. Translations that compiled successfully are shown in green, and those that failed are marked in red. Note same Fortran code is not always shown at the same point in the x-axis. Compilation accuracy of each translated Fortran program differs per model with some LLMs having more difficulty translating certain codes than others. We note that LLMs with a higher number of parameters have more success per Fortran code.
  • Figure 6: Distribution of compile error categories for each C++ translation shows that LLMs produce different errors in their translated code.
  • ...and 1 more figures