Table of Contents
Fetching ...

Weather2K: A Multivariate Spatio-Temporal Benchmark Dataset for Meteorological Forecasting Based on Real-Time Observation Data from Ground Weather Stations

Xun Zhu, Yutong Xiong, Ming Wu, Gaozhen Nie, Bin Zhang, Ziheng Yang

TL;DR

Weather2K delivers a real-time, ground-station–based benchmark for meteorological forecasting, addressing data quality and diversity gaps in existing datasets. It introduces Weather2K-R and Weather2K-S, offering 2,130 stations with 20 factors and 3 position constants across 40,896 steps, plus rigorous QC and two task directions. The authors propose MFMGCN, a multi-graph fusion model that combines four static graphs with a dynamic graph to capture intricate spatio-temporal correlations, and demonstrate superior performance and temporal robustness over baselines. This dataset and method provide a strong foundation to advance data-driven weather forecasting research and enable fairer, more comprehensive benchmarking.

Abstract

Weather forecasting is one of the cornerstones of meteorological work. In this paper, we present a new benchmark dataset named Weather2K, which aims to make up for the deficiencies of existing weather forecasting datasets in terms of real-time, reliability, and diversity, as well as the key bottleneck of data quality. To be specific, our Weather2K is featured from the following aspects: 1) Reliable and real-time data. The data is hourly collected from 2,130 ground weather stations covering an area of 6 million square kilometers. 2) Multivariate meteorological variables. 20 meteorological factors and 3 constants for position information are provided with a length of 40,896 time steps. 3) Applicable to diverse tasks. We conduct a set of baseline tests on time series forecasting and spatio-temporal forecasting. To the best of our knowledge, our Weather2K is the first attempt to tackle weather forecasting task by taking full advantage of the strengths of observation data from ground weather stations. Based on Weather2K, we further propose Meteorological Factors based Multi-Graph Convolution Network (MFMGCN), which can effectively construct the intrinsic correlation among geographic locations based on meteorological factors. Sufficient experiments show that MFMGCN improves both the forecasting performance and temporal robustness. We hope our Weather2K can significantly motivate researchers to develop efficient and accurate algorithms to advance the task of weather forecasting. The dataset can be available at https://github.com/bycnfz/weather2k/.

Weather2K: A Multivariate Spatio-Temporal Benchmark Dataset for Meteorological Forecasting Based on Real-Time Observation Data from Ground Weather Stations

TL;DR

Weather2K delivers a real-time, ground-station–based benchmark for meteorological forecasting, addressing data quality and diversity gaps in existing datasets. It introduces Weather2K-R and Weather2K-S, offering 2,130 stations with 20 factors and 3 position constants across 40,896 steps, plus rigorous QC and two task directions. The authors propose MFMGCN, a multi-graph fusion model that combines four static graphs with a dynamic graph to capture intricate spatio-temporal correlations, and demonstrate superior performance and temporal robustness over baselines. This dataset and method provide a strong foundation to advance data-driven weather forecasting research and enable fairer, more comprehensive benchmarking.

Abstract

Weather forecasting is one of the cornerstones of meteorological work. In this paper, we present a new benchmark dataset named Weather2K, which aims to make up for the deficiencies of existing weather forecasting datasets in terms of real-time, reliability, and diversity, as well as the key bottleneck of data quality. To be specific, our Weather2K is featured from the following aspects: 1) Reliable and real-time data. The data is hourly collected from 2,130 ground weather stations covering an area of 6 million square kilometers. 2) Multivariate meteorological variables. 20 meteorological factors and 3 constants for position information are provided with a length of 40,896 time steps. 3) Applicable to diverse tasks. We conduct a set of baseline tests on time series forecasting and spatio-temporal forecasting. To the best of our knowledge, our Weather2K is the first attempt to tackle weather forecasting task by taking full advantage of the strengths of observation data from ground weather stations. Based on Weather2K, we further propose Meteorological Factors based Multi-Graph Convolution Network (MFMGCN), which can effectively construct the intrinsic correlation among geographic locations based on meteorological factors. Sufficient experiments show that MFMGCN improves both the forecasting performance and temporal robustness. We hope our Weather2K can significantly motivate researchers to develop efficient and accurate algorithms to advance the task of weather forecasting. The dataset can be available at https://github.com/bycnfz/weather2k/.
Paper Structure (51 sections, 17 equations, 7 figures, 7 tables)

This paper contains 51 sections, 17 equations, 7 figures, 7 tables.

Figures (7)

  • Figure 1: The distribution of the 2,130 ground weather stations that supply observation data for Weather2K.
  • Figure 2: (a) The overview of Meteorological Factors based Multi-Graph Convolution Network (MFMGCN). (b) The architecture of spatio-temporal block (ST-block). (c) The architecture of the multi-branch temporal convolution.
  • Figure 3: Overall method comparison in forecasting (a) temperature, (b) visibility, and (c) humidity.
  • Figure 4: Box plots of 20 meteorological factors in Weather2K-R and Weather2K-S.
  • Figure 5: CDF of 20 meteorological factors in Weather2K-S.
  • ...and 2 more figures