Table of Contents
Fetching ...

A Physics-guided Multimodal Transformer Path to Weather and Climate Sciences

Jing Han, Hanting Chen, Kai Han, Xiaomeng Huang, Yongyun Hu, Wenjun Xu, Dacheng Tao, Ping Zhang

TL;DR

This work argues for replacing isolated AI approaches in weather and climate sciences with a physics-guided multimodal transformer that treats diverse observational sources (images, videos, time series, and text) as modalities and fuses them in a shared latent space. It outlines how physics priors can be injected at the outputs, within the model architecture (e.g., PINNs), and as inputs, to improve physical fidelity and generalization. The paper reviews existing AI categories (forecasting, classification/detection, re-scaling, universal embedding, relationship mining) and hybrid approaches (regularization, PDE-based models, input signals), and then proposes a unified path that uses next-token prediction to unify tasks across modalities. It also discusses practical considerations for scalability, data alignment, and handling noisy or incomplete data, emphasizing the potential for real-time, large-scale climate analysis and decision support. Overall, the framework aims to enhance accuracy, interpretability, and efficiency in weather and climate predictions by grounding data-driven methods in physical laws while leveraging multimodal information.

Abstract

With the rapid development of machine learning in recent years, many problems in meteorology can now be addressed using AI models. In particular, data-driven algorithms have significantly improved accuracy compared to traditional methods. Meteorological data is often transformed into 2D images or 3D videos, which are then fed into AI models for learning. Additionally, these models often incorporate physical signals, such as temperature, pressure, and wind speed, to further enhance accuracy and interpretability. In this paper, we review several representative AI + Weather/Climate algorithms and propose a new paradigm where observational data from different perspectives, each with distinct physical meanings, are treated as multimodal data and integrated via transformers. Furthermore, key weather and climate knowledge can be incorporated through regularization techniques to further strengthen the model's capabilities. This new paradigm is versatile and can address a variety of tasks, offering strong generalizability. We also discuss future directions for improving model accuracy and interpretability.

A Physics-guided Multimodal Transformer Path to Weather and Climate Sciences

TL;DR

This work argues for replacing isolated AI approaches in weather and climate sciences with a physics-guided multimodal transformer that treats diverse observational sources (images, videos, time series, and text) as modalities and fuses them in a shared latent space. It outlines how physics priors can be injected at the outputs, within the model architecture (e.g., PINNs), and as inputs, to improve physical fidelity and generalization. The paper reviews existing AI categories (forecasting, classification/detection, re-scaling, universal embedding, relationship mining) and hybrid approaches (regularization, PDE-based models, input signals), and then proposes a unified path that uses next-token prediction to unify tasks across modalities. It also discusses practical considerations for scalability, data alignment, and handling noisy or incomplete data, emphasizing the potential for real-time, large-scale climate analysis and decision support. Overall, the framework aims to enhance accuracy, interpretability, and efficiency in weather and climate predictions by grounding data-driven methods in physical laws while leveraging multimodal information.

Abstract

With the rapid development of machine learning in recent years, many problems in meteorology can now be addressed using AI models. In particular, data-driven algorithms have significantly improved accuracy compared to traditional methods. Meteorological data is often transformed into 2D images or 3D videos, which are then fed into AI models for learning. Additionally, these models often incorporate physical signals, such as temperature, pressure, and wind speed, to further enhance accuracy and interpretability. In this paper, we review several representative AI + Weather/Climate algorithms and propose a new paradigm where observational data from different perspectives, each with distinct physical meanings, are treated as multimodal data and integrated via transformers. Furthermore, key weather and climate knowledge can be incorporated through regularization techniques to further strengthen the model's capabilities. This new paradigm is versatile and can address a variety of tasks, offering strong generalizability. We also discuss future directions for improving model accuracy and interpretability.

Paper Structure

This paper contains 24 sections, 17 equations, 2 figures, 1 table.

Figures (2)

  • Figure 1: Illustration of the categories of AI models in climate science. The inputs, outputs, and neural network architectures of these models usually vary significantly.
  • Figure 2: Illustration of the physics-guided multimodal transformer path to weather and climate sciences. The circle labelled 1, 2, 3 represents adding physical knowledge into the output, model and input, respectively.