WeatherReal: A Benchmark Based on In-Situ Observations for Evaluating Weather Models

Weixin Jin; Jonathan Weyn; Pengcheng Zhao; Siqi Xiang; Jiang Bian; Zuliang Fang; Haiyu Dong; Hongyu Sun; Kit Thambiratnam; Qi Zhang

WeatherReal: A Benchmark Based on In-Situ Observations for Evaluating Weather Models

Weixin Jin, Jonathan Weyn, Pengcheng Zhao, Siqi Xiang, Jiang Bian, Zuliang Fang, Haiyu Dong, Hongyu Sun, Kit Thambiratnam, Qi Zhang

TL;DR

WeatherReal presents a benchmark for evaluating weather models using in-situ near-surface observations to overcome biases inherent in reanalysis datasets like ERA5. It introduces WeatherReal-ISD and WeatherReal-Synoptic, plus MSN Weather user reports, supported by a rigorous, multi-step quality-control pipeline and station-merging strategy to produce high-fidelity observations. The paper demonstrates the benchmark’s value through dataset analyses, case studies of heatwaves, heavy rainfall, and tropical cyclones, and example evaluations comparing data-driven forecasts with NWP baselines. It also outlines tasks and provisional leaderboards to foster community-driven development toward operation-ready, user-focused forecasting improvements, with a plan to expand data sources and extend evaluation capabilities over time.

Abstract

In recent years, AI-based weather forecasting models have matched or even outperformed numerical weather prediction systems. However, most of these models have been trained and evaluated on reanalysis datasets like ERA5. These datasets, being products of numerical models, often diverge substantially from actual observations in some crucial variables like near-surface temperature, wind, precipitation and clouds - parameters that hold significant public interest. To address this divergence, we introduce WeatherReal, a novel benchmark dataset for weather forecasting, derived from global near-surface in-situ observations. WeatherReal also features a publicly accessible quality control and evaluation framework. This paper details the sources and processing methodologies underlying the dataset, and further illustrates the advantage of in-situ observations in capturing hyper-local and extreme weather through comparative analyses and case studies. Using WeatherReal, we evaluated several data-driven models and compared them with leading numerical models. Our work aims to advance the AI-based weather forecasting research towards a more application-focused and operation-ready approach.

WeatherReal: A Benchmark Based on In-Situ Observations for Evaluating Weather Models

TL;DR

Abstract

Paper Structure (33 sections, 18 figures, 4 tables)

This paper contains 33 sections, 18 figures, 4 tables.

Introduction
Datasets
WeatherReal-ISD
WeatherReal-Synoptic
User Reports from MSN Weather
Advanced Data Processing
Data Extraction
Station Merging
Quality Control
Value Range Check
Time Series Check
Cross-Variable Check
Neighboring Stations Check
Flag Refinement
Algorithm Integration
...and 18 more sections

Figures (18)

Figure 1: Example time series of primary and merged stations. (a) 2m temperature of the primary station 99999913752 and (b) the merged stations 72215813752 and 7221589999 in 2023. (c) 2m temperature and (d) surface pressure of stations 72594524283 and 72594024213 in 2023.
Figure 2: (a) Distribution of surface pressure differences between station observation and ERA5 at station 40310099999 in 2023, along with Gaussian fitting, and (b) the raw time series from the station and ERA5. The gray shaded area indicates the range within which the station observations will pass the quality control checks. (c) Identified spike errors in 2m temperature at station 12822099999 in 2023. (d) Unflagged spike in 2m temperature at station 94105099999 in 2023. (e) Spike unset by diurnal cycle check in 2m temperature at station 28612099999 in 2023. (f) Identified persistence errors in 2m temperature at station 06138099999 in 2023.
Figure 3: (a) Time series of dew point temperature observation and ERA5 at station 06022499999 in 2023 and (b) their scatter plot with the x-axis representing station observation and the y-axis representing ERA5. (c) Time series of mean sea-level pressure observations and ERA5 at station 48963099999 in 2023 and (d) their scatter plot. The grey dashed lines in (b) and (d) mark the diagonal of equality between station observation and ERA5.
Figure 4: (a) Time series of surface pressure at station 71298099999 and its neighboring stations in 2023; (b) The spatial location of the station 71298099999 and its neighboring stations.
Figure 5: (a) Time series of mean sea-level pressure at station 59658099999 (Zhanjiang, China) during July 16-18, 2023; (b) The spatial distribution of mean sea-level pressure and wind vectors from ERA5 at 15:00 UTC on July 17, 2023, with a star indicating the location of the station; (c) Observations of mean sea-level pressure and wind barbs from nearby stations at the same timestamp; (d) The bias of mean sea-level pressure from ERA5 compared to station observations at the same timestamp.
...and 13 more figures

WeatherReal: A Benchmark Based on In-Situ Observations for Evaluating Weather Models

TL;DR

Abstract

WeatherReal: A Benchmark Based on In-Situ Observations for Evaluating Weather Models

Authors

TL;DR

Abstract

Table of Contents

Figures (18)