MeteorPred: A Meteorological Multimodal Large Model and Dataset for Severe Weather Event Prediction
Shuo Tang, Jian Xu, Jiadong Zhang, Yi Chen, Qizhao Jin, Lingdong Shen, Chenglin Liu, Shiming Xiang
TL;DR
This work tackles the challenge of predicting severe weather events with AI by introducing MP B Bench, a large scale multimodal dataset that pairs 4D ERA5 meteorological fields with CMA warnings across China, and the Meteorological Multimodal Large Model (MMLM) that ingests these 4D inputs. The model uses three plug in modules, Dynamic Temporal Gating Fusion DTGF, Text Driven Gaussian Spatial Masking TGS and Text Driven Channel Attention TGCA, to fuse temporal spatial and vertical information before passing to an LLM for warning generation. Experiments show MMLM surpasses open source baselines and a closed source GPT 4 o reference across multiple QA tasks, with the best variant achieving substantial gains in MC accuracy and macro F1 while NSW remains challenging. The cross regional generalization study demonstrates the potential for transfer to other geographic domains, highlighting MP B Bench as a foundation for AI driven severe weather forecasting with practical impact for warnings and decision making.
Abstract
Timely and accurate forecasts of severe weather events are essential for early warning and for constraining downstream analysis and decision-making. Since severe weather events prediction still depends on subjective, time-consuming expert interpretation, end-to-end "AI weather station" systems are emerging but face three major challenges: (1) scarcity of severe weather event samples; (2) imperfect alignment between high-dimensional meteorological data and textual warnings; (3) current multimodal language models cannot effectively process high-dimensional meteorological inputs or capture their complex spatiotemporal dependencies. To address these challenges, we introduce MP-Bench, the first large-scale multimodal dataset for severe weather events prediction, comprising 421,363 pairs of raw multi-year meteorological data and corresponding text caption, covering a wide range of severe weather scenarios. On top of this dataset, we develop a Meteorology Multimodal Large Model (MMLM) that directly ingests 4D meteorological inputs. In addition, it is designed to accommodate the unique characteristics of 4D meteorological data flow, incorporating three plug-and-play adaptive fusion modules that enable dynamic feature extraction and integration across temporal sequences, vertical pressure layers, and spatial dimensions. Extensive experiments on MP-Bench show that MMLM achieves strong performance across multiple tasks, demonstrating effective severe weather understanding and representing a key step toward automated, AI-driven severe weather events forecasting systems. Our source code and dataset will be made publicly available.
