Table of Contents
Fetching ...

From Legacy Fortran to Portable Kokkos: An Autonomous Agentic AI Workflow

Sparsh Gupta, Kamalavasan Kamalakkannan, Maxim Moraru, Galen Shipman, Patrick Diehl

TL;DR

The paper addresses the portability gap for legacy Fortran HPC codes on GPU-accelerated systems by introducing a fully autonomous agentic AI workflow that translates Fortran kernels to portable Kokkos C++ and optimizes them across hardware. Using specialized LLM agents, the pipeline handles translation, validation, compilation, execution, functionality testing, and iterative optimization, leveraging SLURM and Spack for reproducible HPC environments. Experimental results on NAS NPBench kernels and DGEMM show functionally correct and performance-portable Kokkos implementations, with proprietary GPT-5 and o4-mini-high generally succeeding where open-source LLMs struggle, and meaningful GFLOPS improvements observed in several kernels. The work demonstrates the feasibility and cost-effectiveness of autonomous Fortran modernization, offering a pathway to broaden access to next-generation HPC across diverse supercomputing platforms.

Abstract

Scientific applications continue to rely on legacy Fortran codebases originally developed for homogeneous, CPU-based systems. As High-Performance Computing (HPC) shifts toward heterogeneous GPU-accelerated architectures, many accelerators lack native Fortran bindings, creating an urgent need to modernize legacy codes for portability. Frameworks like Kokkos provide performance portability and a single-source C++ abstraction, but manual Fortran-to-Kokkos porting demands significant expertise and time. Large language models (LLMs) have shown promise in source-to-source code generation, yet their use in fully autonomous workflows for translating and optimizing parallel code remains largely unexplored, especially for performance portability across diverse hardware. This paper presents an agentic AI workflow where specialized LLM "agents" collaborate to translate, validate, compile, run, test, debug, and optimize Fortran kernels into portable Kokkos C++ programs. Results show the pipeline modernizes a range of benchmark kernels, producing performance-portable Kokkos codes across hardware partitions. Paid OpenAI models such as GPT-5 and o4-mini-high executed the workflow for only a few U.S. dollars, generating optimized codes that surpassed Fortran baselines, whereas open-source models like Llama4-Maverick often failed to yield functional codes. This work demonstrates the feasibility of agentic AI for Fortran-to-Kokkos transformation and offers a pathway for autonomously modernizing legacy scientific applications to run portably and efficiently on diverse supercomputers. It further highlights the potential of LLM-driven agentic systems to perform structured, domain-specific reasoning tasks in scientific and systems-oriented applications.

From Legacy Fortran to Portable Kokkos: An Autonomous Agentic AI Workflow

TL;DR

The paper addresses the portability gap for legacy Fortran HPC codes on GPU-accelerated systems by introducing a fully autonomous agentic AI workflow that translates Fortran kernels to portable Kokkos C++ and optimizes them across hardware. Using specialized LLM agents, the pipeline handles translation, validation, compilation, execution, functionality testing, and iterative optimization, leveraging SLURM and Spack for reproducible HPC environments. Experimental results on NAS NPBench kernels and DGEMM show functionally correct and performance-portable Kokkos implementations, with proprietary GPT-5 and o4-mini-high generally succeeding where open-source LLMs struggle, and meaningful GFLOPS improvements observed in several kernels. The work demonstrates the feasibility and cost-effectiveness of autonomous Fortran modernization, offering a pathway to broaden access to next-generation HPC across diverse supercomputing platforms.

Abstract

Scientific applications continue to rely on legacy Fortran codebases originally developed for homogeneous, CPU-based systems. As High-Performance Computing (HPC) shifts toward heterogeneous GPU-accelerated architectures, many accelerators lack native Fortran bindings, creating an urgent need to modernize legacy codes for portability. Frameworks like Kokkos provide performance portability and a single-source C++ abstraction, but manual Fortran-to-Kokkos porting demands significant expertise and time. Large language models (LLMs) have shown promise in source-to-source code generation, yet their use in fully autonomous workflows for translating and optimizing parallel code remains largely unexplored, especially for performance portability across diverse hardware. This paper presents an agentic AI workflow where specialized LLM "agents" collaborate to translate, validate, compile, run, test, debug, and optimize Fortran kernels into portable Kokkos C++ programs. Results show the pipeline modernizes a range of benchmark kernels, producing performance-portable Kokkos codes across hardware partitions. Paid OpenAI models such as GPT-5 and o4-mini-high executed the workflow for only a few U.S. dollars, generating optimized codes that surpassed Fortran baselines, whereas open-source models like Llama4-Maverick often failed to yield functional codes. This work demonstrates the feasibility of agentic AI for Fortran-to-Kokkos transformation and offers a pathway for autonomously modernizing legacy scientific applications to run portably and efficiently on diverse supercomputers. It further highlights the potential of LLM-driven agentic systems to perform structured, domain-specific reasoning tasks in scientific and systems-oriented applications.

Paper Structure

This paper contains 30 sections, 5 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: Agentic AI workflow for autonomous Fortran-to-Kokkos translation, validation, compilation, runtime execution, functionality testing, and performance optimization. Fixer Agents are triggered on error events (e.g., failed compilation, runtime fault, or incorrect functionality testing output). Agent invocation limits at each stage are enforced via configurable thresholds (e.g., MAX_COMPILE_FIXES). Function tools invoked by the Build and Run agents (, ) utilize SLURM to schedule and monitor jobs and Spack to load the correct Kokkos environment on the hardware partitions. All artifacts and metrics are versioned and stored per version run using the function.
  • Figure 2: Example NVIDIA Nsight Compute OPT suggestion
  • Figure 3: Total agent invocations on AMD MI250 across benchmark kernels for the entire autonomous workflow (baseline + optimization rounds). Bars indicate the number of invocations for build, runtime, and functionality agents for different LLMs. Multiple invocations are expected since the pipeline repeatedly (i) fixes compilation and runtime errors, (ii) verifies and ensures functional correctness, and (iii) performs iterative performance optimization. Higher counts indicate more fixing cycles required before achieving a correct and optimized Kokkos implementation.
  • Figure 4: OpenAI API Token costs for GPT-5 and o4-mini-high across all kernels and partitions.
  • Figure 5: Optimization trajectory of the CG kernel measured in GFLOPS at maximum input size.
  • ...and 1 more figures