Paraformer: Parameterization of Sub-grid Scale Processes Using Transformers
Shuochen Wang, Nishant Yadav, Auroop R. Ganguly
TL;DR
This work addresses the persistent uncertainty in climate modeling arising from sub-grid scale processes by introducing Paraformer, a memory-aware encoder-only Transformer designed for climate parameterization on the ClimSim dataset. By leveraging temporal context through fixed-size windows and a dense output layer, Paraformer captures nonlinear dependencies in sub-grid variables and outperforms several traditional DL baselines, including an MLP, across two variable sets. Results show meaningful improvements in prediction accuracy (e.g., $MAE$ and $R^2$) for many sub-grid outputs, with stronger gains for variables exhibiting vertical structure, while some cloud-related quantities remain challenging. The study highlights the potential of attention mechanisms in climate parameterization, discusses computational considerations, and outlines avenues for future enhancements via spatiotemporal architectures and physics-informed constraints to enable online testing in real-geography GCMs.
Abstract
One of the major sources of uncertainty in the current generation of Global Climate Models (GCMs) is the representation of sub-grid scale physical processes. Over the years, a series of deep-learning-based parameterization schemes have been developed and tested on both idealized and real-geography GCMs. However, datasets on which previous deep-learning models were trained either contain limited variables or have low spatial-temporal coverage, which can not fully simulate the parameterization process. Additionally, these schemes rely on classical architectures while the latest attention mechanism used in Transformer models remains unexplored in this field. In this paper, we propose Paraformer, a "memory-aware" Transformer-based model on ClimSim, the largest dataset ever created for climate parameterization. Our results demonstrate that the proposed model successfully captures the complex non-linear dependencies in the sub-grid scale variables and outperforms classical deep-learning architectures. This work highlights the applicability of the attenuation mechanism in this field and provides valuable insights for developing future deep-learning-based climate parameterization schemes.
