Table of Contents
Fetching ...

MeteorPred: A Meteorological Multimodal Large Model and Dataset for Severe Weather Event Prediction

Shuo Tang, Jian Xu, Jiadong Zhang, Yi Chen, Qizhao Jin, Lingdong Shen, Chenglin Liu, Shiming Xiang

TL;DR

This work tackles the challenge of predicting severe weather events with AI by introducing MP B Bench, a large scale multimodal dataset that pairs 4D ERA5 meteorological fields with CMA warnings across China, and the Meteorological Multimodal Large Model (MMLM) that ingests these 4D inputs. The model uses three plug in modules, Dynamic Temporal Gating Fusion DTGF, Text Driven Gaussian Spatial Masking TGS and Text Driven Channel Attention TGCA, to fuse temporal spatial and vertical information before passing to an LLM for warning generation. Experiments show MMLM surpasses open source baselines and a closed source GPT 4 o reference across multiple QA tasks, with the best variant achieving substantial gains in MC accuracy and macro F1 while NSW remains challenging. The cross regional generalization study demonstrates the potential for transfer to other geographic domains, highlighting MP B Bench as a foundation for AI driven severe weather forecasting with practical impact for warnings and decision making.

Abstract

Timely and accurate forecasts of severe weather events are essential for early warning and for constraining downstream analysis and decision-making. Since severe weather events prediction still depends on subjective, time-consuming expert interpretation, end-to-end "AI weather station" systems are emerging but face three major challenges: (1) scarcity of severe weather event samples; (2) imperfect alignment between high-dimensional meteorological data and textual warnings; (3) current multimodal language models cannot effectively process high-dimensional meteorological inputs or capture their complex spatiotemporal dependencies. To address these challenges, we introduce MP-Bench, the first large-scale multimodal dataset for severe weather events prediction, comprising 421,363 pairs of raw multi-year meteorological data and corresponding text caption, covering a wide range of severe weather scenarios. On top of this dataset, we develop a Meteorology Multimodal Large Model (MMLM) that directly ingests 4D meteorological inputs. In addition, it is designed to accommodate the unique characteristics of 4D meteorological data flow, incorporating three plug-and-play adaptive fusion modules that enable dynamic feature extraction and integration across temporal sequences, vertical pressure layers, and spatial dimensions. Extensive experiments on MP-Bench show that MMLM achieves strong performance across multiple tasks, demonstrating effective severe weather understanding and representing a key step toward automated, AI-driven severe weather events forecasting systems. Our source code and dataset will be made publicly available.

MeteorPred: A Meteorological Multimodal Large Model and Dataset for Severe Weather Event Prediction

TL;DR

This work tackles the challenge of predicting severe weather events with AI by introducing MP B Bench, a large scale multimodal dataset that pairs 4D ERA5 meteorological fields with CMA warnings across China, and the Meteorological Multimodal Large Model (MMLM) that ingests these 4D inputs. The model uses three plug in modules, Dynamic Temporal Gating Fusion DTGF, Text Driven Gaussian Spatial Masking TGS and Text Driven Channel Attention TGCA, to fuse temporal spatial and vertical information before passing to an LLM for warning generation. Experiments show MMLM surpasses open source baselines and a closed source GPT 4 o reference across multiple QA tasks, with the best variant achieving substantial gains in MC accuracy and macro F1 while NSW remains challenging. The cross regional generalization study demonstrates the potential for transfer to other geographic domains, highlighting MP B Bench as a foundation for AI driven severe weather forecasting with practical impact for warnings and decision making.

Abstract

Timely and accurate forecasts of severe weather events are essential for early warning and for constraining downstream analysis and decision-making. Since severe weather events prediction still depends on subjective, time-consuming expert interpretation, end-to-end "AI weather station" systems are emerging but face three major challenges: (1) scarcity of severe weather event samples; (2) imperfect alignment between high-dimensional meteorological data and textual warnings; (3) current multimodal language models cannot effectively process high-dimensional meteorological inputs or capture their complex spatiotemporal dependencies. To address these challenges, we introduce MP-Bench, the first large-scale multimodal dataset for severe weather events prediction, comprising 421,363 pairs of raw multi-year meteorological data and corresponding text caption, covering a wide range of severe weather scenarios. On top of this dataset, we develop a Meteorology Multimodal Large Model (MMLM) that directly ingests 4D meteorological inputs. In addition, it is designed to accommodate the unique characteristics of 4D meteorological data flow, incorporating three plug-and-play adaptive fusion modules that enable dynamic feature extraction and integration across temporal sequences, vertical pressure layers, and spatial dimensions. Extensive experiments on MP-Bench show that MMLM achieves strong performance across multiple tasks, demonstrating effective severe weather understanding and representing a key step toward automated, AI-driven severe weather events forecasting systems. Our source code and dataset will be made publicly available.

Paper Structure

This paper contains 37 sections, 3 equations, 14 figures, 8 tables.

Figures (14)

  • Figure 1: Conceptual illustration for severe weather event prediction using the Meteorological Multimodal Large Model (MMLM).
  • Figure 2: Overview of the MMLM framework and its core components. (a) Displays the MMLM architecture, where outputs from the DTGF, TGCA, and TGS modules are concatenated and integrated in a learnable Fusion Layer before being fed into the LLM. (b), (c), and (d) illustrate the three plug-and-play modules, where color intensity represents the adaptive weights across temporal, channel, and spatial dimensions. (e) shows four QA samples from MP-Bench.
  • Figure 3: Distribution of four QA task types in MP-Bench, including MC, T/F, RSW, NSW.
  • Figure 4: Weight distribution patterns of three types of plug-and-play modules: (a) DTGF temporal weights difference (positive: higher for red warnings; negative: higher for blue warnings); (b) TGS spatial attention map;(c) TGCA channel weights of the V-component of wind across pressure levels. Additional examples are in appendix.
  • Figure 5: Spatial distribution of severe weather events of China and US.
  • ...and 9 more figures