Table of Contents
Fetching ...

Paraformer: Parameterization of Sub-grid Scale Processes Using Transformers

Shuochen Wang, Nishant Yadav, Auroop R. Ganguly

TL;DR

This work addresses the persistent uncertainty in climate modeling arising from sub-grid scale processes by introducing Paraformer, a memory-aware encoder-only Transformer designed for climate parameterization on the ClimSim dataset. By leveraging temporal context through fixed-size windows and a dense output layer, Paraformer captures nonlinear dependencies in sub-grid variables and outperforms several traditional DL baselines, including an MLP, across two variable sets. Results show meaningful improvements in prediction accuracy (e.g., $MAE$ and $R^2$) for many sub-grid outputs, with stronger gains for variables exhibiting vertical structure, while some cloud-related quantities remain challenging. The study highlights the potential of attention mechanisms in climate parameterization, discusses computational considerations, and outlines avenues for future enhancements via spatiotemporal architectures and physics-informed constraints to enable online testing in real-geography GCMs.

Abstract

One of the major sources of uncertainty in the current generation of Global Climate Models (GCMs) is the representation of sub-grid scale physical processes. Over the years, a series of deep-learning-based parameterization schemes have been developed and tested on both idealized and real-geography GCMs. However, datasets on which previous deep-learning models were trained either contain limited variables or have low spatial-temporal coverage, which can not fully simulate the parameterization process. Additionally, these schemes rely on classical architectures while the latest attention mechanism used in Transformer models remains unexplored in this field. In this paper, we propose Paraformer, a "memory-aware" Transformer-based model on ClimSim, the largest dataset ever created for climate parameterization. Our results demonstrate that the proposed model successfully captures the complex non-linear dependencies in the sub-grid scale variables and outperforms classical deep-learning architectures. This work highlights the applicability of the attenuation mechanism in this field and provides valuable insights for developing future deep-learning-based climate parameterization schemes.

Paraformer: Parameterization of Sub-grid Scale Processes Using Transformers

TL;DR

This work addresses the persistent uncertainty in climate modeling arising from sub-grid scale processes by introducing Paraformer, a memory-aware encoder-only Transformer designed for climate parameterization on the ClimSim dataset. By leveraging temporal context through fixed-size windows and a dense output layer, Paraformer captures nonlinear dependencies in sub-grid variables and outperforms several traditional DL baselines, including an MLP, across two variable sets. Results show meaningful improvements in prediction accuracy (e.g., and ) for many sub-grid outputs, with stronger gains for variables exhibiting vertical structure, while some cloud-related quantities remain challenging. The study highlights the potential of attention mechanisms in climate parameterization, discusses computational considerations, and outlines avenues for future enhancements via spatiotemporal architectures and physics-informed constraints to enable online testing in real-geography GCMs.

Abstract

One of the major sources of uncertainty in the current generation of Global Climate Models (GCMs) is the representation of sub-grid scale physical processes. Over the years, a series of deep-learning-based parameterization schemes have been developed and tested on both idealized and real-geography GCMs. However, datasets on which previous deep-learning models were trained either contain limited variables or have low spatial-temporal coverage, which can not fully simulate the parameterization process. Additionally, these schemes rely on classical architectures while the latest attention mechanism used in Transformer models remains unexplored in this field. In this paper, we propose Paraformer, a "memory-aware" Transformer-based model on ClimSim, the largest dataset ever created for climate parameterization. Our results demonstrate that the proposed model successfully captures the complex non-linear dependencies in the sub-grid scale variables and outperforms classical deep-learning architectures. This work highlights the applicability of the attenuation mechanism in this field and provides valuable insights for developing future deep-learning-based climate parameterization schemes.

Paper Structure

This paper contains 9 sections, 11 figures, 5 tables.

Figures (11)

  • Figure 1: The architecture of Paraformer and the data processing workflow. The shape of data is labeled in parentheses for each step. B and B_ new represent two different batch dimensions of data. In the raw data, B combines the spatial and temporal dimensions. num_ features refers to the number of input (grid-scale) and output (sub-grid scale) variables. num_ seq and seq_ len represent the number and length of sequences, respectively. Metrics for prediction accuracy include Mean Absolute Error (MAE), Root Mean Square Error (RMSE) and Coefficient of Determination (R2).
  • Figure 2: MAE, RMSE and R2 of $dT/dt$ and $dq/dt$ using MLP and Paraformer on variable set v1. Each index on the x-axis represents a vertical level in the atmosphere starting from the top (i.e. level index 0 represents the top of the atmosphere). Units of non-energy flux variables are converted to a common energy unit, W/m2yu2024climsim. Negative R2 values are not shown.
  • Figure 3: R2 of daily-mean, zonal-mean $dT/dt$ and $dq/dt$ for MLP and Paraformer at different pressure levels in variable set v1. Yellow contours cover regions of > 0.9R2, orange contours cover regions of > 0.7R2.
  • Figure 4: Spatial distribution of R2 using MLP (left column) and Paraformer (right column) of 8 scalar target variables in v1. The names of the variables are labeled on the left.
  • Figure A1: Same as Figure \ref{['fig:1']}, but for v2.
  • ...and 6 more figures