MetMamba: Regional Weather Forecasting with Spatial-Temporal Mamba Model

Haoyu Qin; Yungang Chen; Qianchuan Jiang; Pengchao Sun; Xiancai Ye; Chao Lin

MetMamba: Regional Weather Forecasting with Spatial-Temporal Mamba Model

Haoyu Qin, Yungang Chen, Qianchuan Jiang, Pengchao Sun, Xiancai Ye, Chao Lin

TL;DR

This work addresses accurate regional weather prediction with deep learning by introducing MetMamba, a limited-area DLWP backbone based on the Mamba state-space model. MetMamba processes spatial-temporal data natively and is evaluated alongside Swin- and AFNO-based backbones, with a training regime that couples the local model to a global host via lateral boundary conditions. The results show MetMamba achieves superior or comparable performance to global baselines across most variables, demonstrates reduced artifacts at long lead times, and confirms the viability of DLWP-LAM with global-host coupling. The study underscores the potential of state-space backbones for high-resolution regional forecasting and outlines paths to further improvements through larger datasets and better host models.

Abstract

Deep Learning based Weather Prediction (DLWP) models have been improving rapidly over the last few years, surpassing state of the art numerical weather forecasts by significant margins. While much of the optimization effort is focused on training curriculum to extend forecast range in the global context, two aspects remains less explored: limited area modeling and better backbones for weather forecasting. We show in this paper that MetMamba, a DLWP model built on a state-of-the-art state-space model, Mamba, offers notable performance gains and unique advantages over other popular backbones using traditional attention mechanisms and neural operators. We also demonstrate the feasibility of deep learning based limited area modeling via coupled training with a global host model.

MetMamba: Regional Weather Forecasting with Spatial-Temporal Mamba Model

TL;DR

Abstract

Paper Structure (46 sections, 5 equations, 37 figures, 2 tables)

This paper contains 46 sections, 5 equations, 37 figures, 2 tables.

Introduction
Related Work
Neural Operators
Transformers
Mamba Model
Limited Area Modeling
Methodology
Dataset
Models
General Architecture
Swin Block
Swin-AFNO Fusion Block
Mamba-3D Block
DEM encoder
AdaLN conditioning
...and 31 more sections

Figures (37)

Figure 1: (a) general model architecture: the DLWP-LAM takes primary inputs: two initial conditions $IC_{0}$ at initialization time $t_{0}$ and $IC_{-1}$ at previous (6 hours before) initialization time $t_{0-6h}$, and auxiliary inputs: constants from Digital Elevation Model (DEM) and elapsed time, they are processed by a DEM specific embedder, a conditioning module or by simple concatenation. The input tensor is then processed by $N$ blocks. The output is decoded with a simple linear layer and pixel shuffle operation. (b) DEM Embedder: an embedder that processes different information from the DEM accordingly. (c) AdaLN Block: a block that integrates elapsed year time via the use of adaLN, this will regulate all blocks in the process step.
Figure 2: Lateral Boundary Condition merging scheme, used in LBC adaptation training and inference rollout
Figure 3: Root Mean Square Error (RMSE) of headline variables, comparing against ground truth ERA5 in 1418 5-day forecasts, evaluated on the test set of year 2022, for 3 types of DLWP-LAMs discussed in this paper. We omit variable $10u$ here for its similarity with the variable $10v$.
Figure 4: 4 steps of auto-regressive training: Normalized Root Mean Square Error (Norm. RMSE) of headline variables, comparing against FourCastNet (SFNO) in 1418 5-day forecasts, evaluated on the test set of year 2022. We omit variable $10u$ here for its similarity with the variable $10v$. The LAM model shows improvements on long lead times as the LBC-Merging auto-regressive training progresses.
Figure 5: Met-Mamba Architecture (a) Met-Mamba model architecture, the model takes primary inputs: two initial conditions $IC_{0}$ at initialization time $t_{0}$ and $IC_{-1}$ at previous (6 hours before) initialization time $t_{0}-6h$, and auxiliary inputs: DEM masks, elapsed time etc, processed by a specific embedder, linear layer, conditioning module or by simple concatenation. The spatial-temporal tensor is than processed by $N$ Mamba3D blocks, conditioned by elapsed time (year). The output is decoded with a simple linear layer and pixel shuffle operation. (b) AdaLN-Mamba3D Block, a block that utilizes depth-wise 3D convolution and mamba's selective scan to achieve token (spatial, temporal) mixing and channel mixing. The block integrates seasonal variation via the use of adaLN. (c) SS3D operator, a module that flattens and rearranges input spatial-temporal tensor to achieve different memory layouts (scan routes) for Mamba's selective scan to associate information on different directions.
...and 32 more figures

MetMamba: Regional Weather Forecasting with Spatial-Temporal Mamba Model

TL;DR

Abstract

MetMamba: Regional Weather Forecasting with Spatial-Temporal Mamba Model

Authors

TL;DR

Abstract

Table of Contents

Figures (37)