Leveraging Large Language Models for Code Translation and Software Development in Scientific Computing
Akash Dhruv, Anshu Dubey
TL;DR
The paper tackles the productivity gap in translating legacy Fortran scientific codes to modern C++ for HPC interoperability. It introduces CodeScribe, a four-command workflow (Index, Inspect, Draft, Translate) that uses seed prompts and retrieval-augmented generation to guide AI-assisted translation while mandating human verification, and it explicitly handles the $1$-based Fortran vs $0$-based C++ indexing during data-structure translation. Through MCFM and other targets such as Noah-MP and ERF, the work provides a quantitative comparison of several configurations and analyzes the impact of RAG on translation quality. The results show that GPT-4o offers the strongest overall performance, while highlighting persistent correctness challenges and suggesting directions for integration with frameworks like LASSI to broaden applicability and reliability in scientific workflows.
Abstract
The emergence of foundational models and generative artificial intelligence (GenAI) is poised to transform productivity in scientific computing, especially in code development, refactoring, and translating from one programming language to another. However, because the output of GenAI cannot be guaranteed to be correct, manual intervention remains necessary. Some of this intervention can be automated through task-specific tools, alongside additional methodologies for correctness verification and effective prompt development. We explored the application of GenAI in assisting with code translation, language interoperability, and codebase inspection within a legacy Fortran codebase used to simulate particle interactions at the Large Hadron Collider (LHC). In the process, we developed a tool, CodeScribe, which combines prompt engineering with user supervision to establish an efficient process for code conversion. In this paper, we demonstrate how CodeScribe assists in converting Fortran code to C++, generating Fortran-C APIs for integrating legacy systems with modern C++ libraries, and providing developer support for code organization and algorithm implementation. We also address the challenges of AI-driven code translation and highlight its benefits for enhancing productivity in scientific computing workflows.
