Table of Contents
Fetching ...

CLLMate: A Multimodal Benchmark for Weather and Climate Events Forecasting

Haobo Li, Zhaowei Wang, Jiachen Wang, Yueya Wang, Alexis Kai Hon Lau, Huamin Qu

TL;DR

This paper introduces the Weather and Climate Event Forecasting (WCEF) task and the CLLMate dataset, which aligns ERA5 meteorological rasters with expert-validated environmental news events to forecast textual weather events and their consequences. It benchmarks 23 multimodal large language models across closed-source, open-source, and fine-tuned variants, revealing that while models can surpass random baselines, performance on consequence forecasting remains limited and highly dependent on task-specific alignment. The results underscore the value of fine-tuning and the need for domain-optimized architectures that can better bridge numerical meteorology with textual narratives. CLLMate serves as a foundational benchmark, enabling future research on integrating multimodal data for actionable, narratively grounded weather and climate forecasting. The work also highlights opportunities to expand modalities and incorporate richer knowledge representations to improve causal reasoning in environmental contexts.

Abstract

Forecasting weather and climate events is crucial for making appropriate measures to mitigate environmental hazards and minimize losses. However, existing environmental forecasting research focuses narrowly on predicting numerical meteorological variables (e.g., temperature), neglecting the translation of these variables into actionable textual narratives of events and their consequences. To bridge this gap, we proposed Weather and Climate Event Forecasting (WCEF), a new task that leverages numerical meteorological raster data and textual event data to predict weather and climate events. This task is challenging to accomplish due to difficulties in aligning multimodal data and the lack of supervised datasets. To address these challenges, we present CLLMate, the first multimodal dataset for WCEF, using 26,156 environmental news articles aligned with ERA5 reanalysis data. We systematically benchmark 23 existing MLLMs on CLLMate, including closed-source, open-source, and our fine-tuned models. Our experiments reveal the advantages and limitations of existing MLLMs and the value of CLLMate for the training and benchmarking of the WCEF task.

CLLMate: A Multimodal Benchmark for Weather and Climate Events Forecasting

TL;DR

This paper introduces the Weather and Climate Event Forecasting (WCEF) task and the CLLMate dataset, which aligns ERA5 meteorological rasters with expert-validated environmental news events to forecast textual weather events and their consequences. It benchmarks 23 multimodal large language models across closed-source, open-source, and fine-tuned variants, revealing that while models can surpass random baselines, performance on consequence forecasting remains limited and highly dependent on task-specific alignment. The results underscore the value of fine-tuning and the need for domain-optimized architectures that can better bridge numerical meteorology with textual narratives. CLLMate serves as a foundational benchmark, enabling future research on integrating multimodal data for actionable, narratively grounded weather and climate forecasting. The work also highlights opportunities to expand modalities and incorporate richer knowledge representations to improve causal reasoning in environmental contexts.

Abstract

Forecasting weather and climate events is crucial for making appropriate measures to mitigate environmental hazards and minimize losses. However, existing environmental forecasting research focuses narrowly on predicting numerical meteorological variables (e.g., temperature), neglecting the translation of these variables into actionable textual narratives of events and their consequences. To bridge this gap, we proposed Weather and Climate Event Forecasting (WCEF), a new task that leverages numerical meteorological raster data and textual event data to predict weather and climate events. This task is challenging to accomplish due to difficulties in aligning multimodal data and the lack of supervised datasets. To address these challenges, we present CLLMate, the first multimodal dataset for WCEF, using 26,156 environmental news articles aligned with ERA5 reanalysis data. We systematically benchmark 23 existing MLLMs on CLLMate, including closed-source, open-source, and our fine-tuned models. Our experiments reveal the advantages and limitations of existing MLLMs and the value of CLLMate for the training and benchmarking of the WCEF task.
Paper Structure (60 sections, 2 equations, 9 figures, 3 tables)

This paper contains 60 sections, 2 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: The CLLMate framework is designed to extract textual weather and climate events and align them with meteorological raster data for the WCEF task.
  • Figure 2: Spatial distribution of extracted events. Each rectangle represents an extracted event. The events span most global regions, with notable concentrations in East Asia, North America, and Europe.
  • Figure 3: Temporal distribution of extracted events. The events span a long time period, from 2015 to 2023. A notable outlier in the number of events occurred due to the catastrophic flooding in Zhengzhou in July 2021.
  • Figure 4: Distribution of categories within the meteorological phenomena category (3,979/7747 events). The distribution is imbalanced, reflecting the nature of event reporting in the news.
  • Figure 5: Distribution of categories within the consequences category (3,768/7747 events). The distribution is imbalanced, reflecting the nature of event reporting in the news.
  • ...and 4 more figures