Table of Contents
Fetching ...

Proof-of-concept: Using ChatGPT to Translate and Modernize an Earth System Model from Fortran to Python/JAX

Anthony Zhou, Linnia Hawkins, Pierre Gentine

TL;DR

Earth system models are largely written in Fortran and lack differentiability, which limits GPU acceleration and online learning. The authors introduce a semi-automated translation workflow that divides a Fortran codebase into units, translates each unit to Python/JAX using GPT-4, and validates via unit testing, demonstrated on the leaf-level photosynthesis component of CESM CLM. The Python/JAX version enables automatic differentiation and gradient-based parameter estimation, e.g., estimating $V_{c,\max}$, and achieves substantial runtime improvements (up to $\sim100\times$ on GPU) compared with the original Fortran implementation. This work contributes an open-source translation pipeline and illustrates a feasible path toward differentiable, GPU-accelerated climate components that are more accessible to junior scientists.

Abstract

Earth system models (ESMs) are vital for understanding past, present, and future climate, but they suffer from legacy technical infrastructure. ESMs are primarily implemented in Fortran, a language that poses a high barrier of entry for early career scientists and lacks a GPU runtime, which has become essential for continued advancement as GPU power increases and CPU scaling slows. Fortran also lacks differentiability - the capacity to differentiate through numerical code - which enables hybrid models that integrate machine learning methods. Converting an ESM from Fortran to Python/JAX could resolve these issues. This work presents a semi-automated method for translating individual model components from Fortran to Python/JAX using a large language model (GPT-4). By translating the photosynthesis model from the Community Earth System Model (CESM), we demonstrate that the Python/JAX version results in up to 100x faster runtimes using GPU parallelization, and enables parameter estimation via automatic differentiation. The Python code is also easy to read and run and could be used by instructors in the classroom. This work illustrates a path towards the ultimate goal of making climate models fast, inclusive, and differentiable.

Proof-of-concept: Using ChatGPT to Translate and Modernize an Earth System Model from Fortran to Python/JAX

TL;DR

Earth system models are largely written in Fortran and lack differentiability, which limits GPU acceleration and online learning. The authors introduce a semi-automated translation workflow that divides a Fortran codebase into units, translates each unit to Python/JAX using GPT-4, and validates via unit testing, demonstrated on the leaf-level photosynthesis component of CESM CLM. The Python/JAX version enables automatic differentiation and gradient-based parameter estimation, e.g., estimating , and achieves substantial runtime improvements (up to on GPU) compared with the original Fortran implementation. This work contributes an open-source translation pipeline and illustrates a feasible path toward differentiable, GPU-accelerated climate components that are more accessible to junior scientists.

Abstract

Earth system models (ESMs) are vital for understanding past, present, and future climate, but they suffer from legacy technical infrastructure. ESMs are primarily implemented in Fortran, a language that poses a high barrier of entry for early career scientists and lacks a GPU runtime, which has become essential for continued advancement as GPU power increases and CPU scaling slows. Fortran also lacks differentiability - the capacity to differentiate through numerical code - which enables hybrid models that integrate machine learning methods. Converting an ESM from Fortran to Python/JAX could resolve these issues. This work presents a semi-automated method for translating individual model components from Fortran to Python/JAX using a large language model (GPT-4). By translating the photosynthesis model from the Community Earth System Model (CESM), we demonstrate that the Python/JAX version results in up to 100x faster runtimes using GPU parallelization, and enables parameter estimation via automatic differentiation. The Python code is also easy to read and run and could be used by instructors in the classroom. This work illustrates a path towards the ultimate goal of making climate models fast, inclusive, and differentiable.
Paper Structure (14 sections, 5 figures, 2 tables)

This paper contains 14 sections, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Workflow for translating a climate model from Fortran to Python, using static analysis, code generation, and unit testing.
  • Figure 2: Comparing runtime of leaf-level photosynthesis in several Python translations with the original Fortran version. Runtime was measured on an Amazon EC2 G5.4xlarge instance with one NVIDIA A10G GPU.
  • Figure 3: Measured (points) and modeled (lines) relationship between the internal partial pressure of CO2 (Pa) and the rate of assimilation (umol/m2/s). The modeled values use the Vcmax parameter value selected using either uniform sampling (orange) or gradient descent (green).
  • Figure 4: Dependency graph for the function 'hybrid' from the leaf-level photosynthesis module. Each node corresponds to a function defined in this module, and each edge corresponds to a function call.
  • Figure 5: Visualization of code chunking process. In step 1, we chunk a codebase into individual units using a parsing tool and trace references. In step 2, we use those references to form a dependency graph.