ClimateBench-M: A Multi-Modal Climate Data Benchmark with a Simple Generative Method
Dongqi Fu, Yada Zhu, Zhining Liu, Lecheng Zheng, Xiao Lin, Zihao Li, Liri Fang, Katherine Tieu, Onkar Bhardwaj, Kommy Weldemariam, Hanghang Tong, Hendrik Hamann, Jingrui He
TL;DR
ClimateBench-M proposes a first multi-modal climate benchmark that aligns ERA5 time-series, NOAA extreme weather records, and NASA HLS imagery under a unified spatiotemporal grid, enabling three tasks: weather forecasting, thunderstorm alerting, and crop segmentation. It introduces SGM, an encoder–decoder framework with dual pipelines and a causality-aware training objective that leverages a variational DAG approach and neural Granger causality to deliver strong forecasting, anomaly detection, and segmentation performance. Across experiments, SGM and its persistence-enhanced variant achieve notable improvements over baselines in MAE for forecasting, AUC-ROC for anomaly detection, and IoU/accuracy for crop segmentation. The work demonstrates the value of integrated multi-modal climate benchmarks for advancing robust, generalizable climate modeling and highlights directions for expanding modalities and language-grounded representations. Overall, ClimateBench-M provides a scalable platform and a competitive generative baseline that can drive future climate AI research and practical forecasting improvements.
Abstract
Climate science studies the structure and dynamics of Earth's climate system and seeks to understand how climate changes over time, where the data is usually stored in the format of time series, recording the climate features, geolocation, time attributes, etc. Recently, much research attention has been paid to the climate benchmarks. In addition to the most common task of weather forecasting, several pioneering benchmark works are proposed for extending the modality, such as domain-specific applications like tropical cyclone intensity prediction and flash flood damage estimation, or climate statement and confidence level in the format of natural language. To further motivate the artificial general intelligence development for climate science, in this paper, we first contribute a multi-modal climate benchmark, i.e., ClimateBench-M, which aligns (1) the time series climate data from ERA5, (2) extreme weather events data from NOAA, and (3) satellite image data from NASA HLS based on a unified spatial-temporal granularity. Second, under each data modality, we also propose a simple but strong generative method that could produce competitive performance in weather forecasting, thunderstorm alerts, and crop segmentation tasks in the proposed ClimateBench-M. The data and code of ClimateBench-M are publicly available at https://github.com/iDEA-iSAIL-Lab-UIUC/ClimateBench-M.
