Weather2K: A Multivariate Spatio-Temporal Benchmark Dataset for Meteorological Forecasting Based on Real-Time Observation Data from Ground Weather Stations
Xun Zhu, Yutong Xiong, Ming Wu, Gaozhen Nie, Bin Zhang, Ziheng Yang
TL;DR
Weather2K delivers a real-time, ground-station–based benchmark for meteorological forecasting, addressing data quality and diversity gaps in existing datasets. It introduces Weather2K-R and Weather2K-S, offering 2,130 stations with 20 factors and 3 position constants across 40,896 steps, plus rigorous QC and two task directions. The authors propose MFMGCN, a multi-graph fusion model that combines four static graphs with a dynamic graph to capture intricate spatio-temporal correlations, and demonstrate superior performance and temporal robustness over baselines. This dataset and method provide a strong foundation to advance data-driven weather forecasting research and enable fairer, more comprehensive benchmarking.
Abstract
Weather forecasting is one of the cornerstones of meteorological work. In this paper, we present a new benchmark dataset named Weather2K, which aims to make up for the deficiencies of existing weather forecasting datasets in terms of real-time, reliability, and diversity, as well as the key bottleneck of data quality. To be specific, our Weather2K is featured from the following aspects: 1) Reliable and real-time data. The data is hourly collected from 2,130 ground weather stations covering an area of 6 million square kilometers. 2) Multivariate meteorological variables. 20 meteorological factors and 3 constants for position information are provided with a length of 40,896 time steps. 3) Applicable to diverse tasks. We conduct a set of baseline tests on time series forecasting and spatio-temporal forecasting. To the best of our knowledge, our Weather2K is the first attempt to tackle weather forecasting task by taking full advantage of the strengths of observation data from ground weather stations. Based on Weather2K, we further propose Meteorological Factors based Multi-Graph Convolution Network (MFMGCN), which can effectively construct the intrinsic correlation among geographic locations based on meteorological factors. Sufficient experiments show that MFMGCN improves both the forecasting performance and temporal robustness. We hope our Weather2K can significantly motivate researchers to develop efficient and accurate algorithms to advance the task of weather forecasting. The dataset can be available at https://github.com/bycnfz/weather2k/.
