WeatherReal: A Benchmark Based on In-Situ Observations for Evaluating Weather Models
Weixin Jin, Jonathan Weyn, Pengcheng Zhao, Siqi Xiang, Jiang Bian, Zuliang Fang, Haiyu Dong, Hongyu Sun, Kit Thambiratnam, Qi Zhang
TL;DR
WeatherReal presents a benchmark for evaluating weather models using in-situ near-surface observations to overcome biases inherent in reanalysis datasets like ERA5. It introduces WeatherReal-ISD and WeatherReal-Synoptic, plus MSN Weather user reports, supported by a rigorous, multi-step quality-control pipeline and station-merging strategy to produce high-fidelity observations. The paper demonstrates the benchmark’s value through dataset analyses, case studies of heatwaves, heavy rainfall, and tropical cyclones, and example evaluations comparing data-driven forecasts with NWP baselines. It also outlines tasks and provisional leaderboards to foster community-driven development toward operation-ready, user-focused forecasting improvements, with a plan to expand data sources and extend evaluation capabilities over time.
Abstract
In recent years, AI-based weather forecasting models have matched or even outperformed numerical weather prediction systems. However, most of these models have been trained and evaluated on reanalysis datasets like ERA5. These datasets, being products of numerical models, often diverge substantially from actual observations in some crucial variables like near-surface temperature, wind, precipitation and clouds - parameters that hold significant public interest. To address this divergence, we introduce WeatherReal, a novel benchmark dataset for weather forecasting, derived from global near-surface in-situ observations. WeatherReal also features a publicly accessible quality control and evaluation framework. This paper details the sources and processing methodologies underlying the dataset, and further illustrates the advantage of in-situ observations in capturing hyper-local and extreme weather through comparative analyses and case studies. Using WeatherReal, we evaluated several data-driven models and compared them with leading numerical models. Our work aims to advance the AI-based weather forecasting research towards a more application-focused and operation-ready approach.
