Table of Contents
Fetching ...

HEAL-ViT: Vision Transformers on a spherical mesh for medium-range weather forecasting

Vivek Ramavajjala

TL;DR

HEAL-ViT is presented, a novel architecture that uses ViT models on a spherical mesh, thus benefiting from both the spatial homogeneity enjoyed by graph-based models and efficient attention-based mechanisms exploited by transformers.

Abstract

In recent years, a variety of ML architectures and techniques have seen success in producing skillful medium range weather forecasts. In particular, Vision Transformer (ViT)-based models (e.g. Pangu-Weather, FuXi) have shown strong performance, working nearly "out-of-the-box" by treating weather data as a multi-channel image on a rectilinear grid. While a rectilinear grid is appropriate for 2D images, weather data is inherently spherical and thus heavily distorted at the poles on a rectilinear grid, leading to disproportionate compute being used to model data near the poles. Graph-based methods (e.g. GraphCast) do not suffer from this problem, as they map the longitude-latitude grid to a spherical mesh, but are generally more memory intensive and tend to need more compute resources for training and inference. While spatially homogeneous, the spherical mesh does not lend itself readily to be modeled by ViT-based models that implicitly rely on the rectilinear grid structure. We present HEAL-ViT, a novel architecture that uses ViT models on a spherical mesh, thus benefiting from both the spatial homogeneity enjoyed by graph-based models and efficient attention-based mechanisms exploited by transformers. HEAL-ViT produces weather forecasts that outperform the ECMWF IFS on key metrics, and demonstrate better bias accumulation and blurring than other ML weather prediction models. Further, the lowered compute footprint of HEAL-ViT makes it attractive for operational use as well, where other models in addition to a 6-hourly prediction model may be needed to produce the full set of operational forecasts required.

HEAL-ViT: Vision Transformers on a spherical mesh for medium-range weather forecasting

TL;DR

HEAL-ViT is presented, a novel architecture that uses ViT models on a spherical mesh, thus benefiting from both the spatial homogeneity enjoyed by graph-based models and efficient attention-based mechanisms exploited by transformers.

Abstract

In recent years, a variety of ML architectures and techniques have seen success in producing skillful medium range weather forecasts. In particular, Vision Transformer (ViT)-based models (e.g. Pangu-Weather, FuXi) have shown strong performance, working nearly "out-of-the-box" by treating weather data as a multi-channel image on a rectilinear grid. While a rectilinear grid is appropriate for 2D images, weather data is inherently spherical and thus heavily distorted at the poles on a rectilinear grid, leading to disproportionate compute being used to model data near the poles. Graph-based methods (e.g. GraphCast) do not suffer from this problem, as they map the longitude-latitude grid to a spherical mesh, but are generally more memory intensive and tend to need more compute resources for training and inference. While spatially homogeneous, the spherical mesh does not lend itself readily to be modeled by ViT-based models that implicitly rely on the rectilinear grid structure. We present HEAL-ViT, a novel architecture that uses ViT models on a spherical mesh, thus benefiting from both the spatial homogeneity enjoyed by graph-based models and efficient attention-based mechanisms exploited by transformers. HEAL-ViT produces weather forecasts that outperform the ECMWF IFS on key metrics, and demonstrate better bias accumulation and blurring than other ML weather prediction models. Further, the lowered compute footprint of HEAL-ViT makes it attractive for operational use as well, where other models in addition to a 6-hourly prediction model may be needed to produce the full set of operational forecasts required.
Paper Structure (37 sections, 14 equations, 14 figures, 2 tables)

This paper contains 37 sections, 14 equations, 14 figures, 2 tables.

Figures (14)

  • Figure 1: SWIN transformers perform local attention within non-overlapping windows, and in alternate layers shift the windows by half the window width to allow for cross-window connections to be learned.
  • Figure 2: HEALPix meshes at increasingly finer resolutions.
  • Figure 3: Flattened view of the 12 faces of the HEALPix mesh, each sub-divided into 16 pixels. The HEALPix mesh provides a regular ordering of the mesh nodes, with each mesh node having neighbors along cardinal directions. The whole mesh can thus be oriented North-South and East-West. 4 of the faces are "central", along the equator, while 4 cover the northern hemisphere, and 4 cover the southern hemisphere.
  • Figure 4: Windowed HEALPix mesh, with each HEALPix face containing 4 windows, and each window containing 4 pixels. This corresponds to a refinement level of $n=2$ with 192 fine pixels, and a window size parameter of $w=1$. Each window contains $(2^w)^2=4$ pixels.
  • Figure 5: Shifted windows on the HEALPix mesh, where each original window from figure \ref{['fig:hpowindow']} is effectively shifted by one quadrant south and west.
  • ...and 9 more figures