ClimaX: A foundation model for weather and climate
Tung Nguyen, Johannes Brandstetter, Ashish Kapoor, Jayesh K. Gupta, Aditya Grover
TL;DR
ClimaX presents a Transformer-based foundation model for weather and climate that can be pretrained on heterogeneous CMIP6 simulations and finetuned for diverse tasks including global/regional forecasting, S2S, climate projection, and downscaling. It introduces per-variable tokenization and cross-variable aggregation to handle multimodal, irregular climate data, combined with a randomized forecasting pretraining objective. Across ClimateBench and WeatherBench-inspired benchmarks, ClimaX demonstrates strong generalization and competitive performance at modest compute, with clear scaling behavior as data, model size, and resolution increase. This work suggests a viable path toward general, data-driven Earth system models that can adapt to a wide range of spatiotemporal scales and tasks.
Abstract
Most state-of-the-art approaches for weather and climate modeling are based on physics-informed numerical models of the atmosphere. These approaches aim to model the non-linear dynamics and complex interactions between multiple variables, which are challenging to approximate. Additionally, many such numerical models are computationally intensive, especially when modeling the atmospheric phenomenon at a fine-grained spatial and temporal resolution. Recent data-driven approaches based on machine learning instead aim to directly solve a downstream forecasting or projection task by learning a data-driven functional mapping using deep neural networks. However, these networks are trained using curated and homogeneous climate datasets for specific spatiotemporal tasks, and thus lack the generality of numerical models. We develop and demonstrate ClimaX, a flexible and generalizable deep learning model for weather and climate science that can be trained using heterogeneous datasets spanning different variables, spatio-temporal coverage, and physical groundings. ClimaX extends the Transformer architecture with novel encoding and aggregation blocks that allow effective use of available compute while maintaining general utility. ClimaX is pre-trained with a self-supervised learning objective on climate datasets derived from CMIP6. The pre-trained ClimaX can then be fine-tuned to address a breadth of climate and weather tasks, including those that involve atmospheric variables and spatio-temporal scales unseen during pretraining. Compared to existing data-driven baselines, we show that this generality in ClimaX results in superior performance on benchmarks for weather forecasting and climate projections, even when pretrained at lower resolutions and compute budgets. The source code is available at https://github.com/microsoft/ClimaX.
