VN-Net: Vision-Numerical Fusion Graph Convolutional Network for Sparse Spatio-Temporal Meteorological Forecasting
Yutong Xiong, Xun Zhu, Ming Wu, Weiqing Li, Fanbin Mo, Chuang Zhang, Bin Zhang
TL;DR
VN-Net addresses sparse spatio-temporal meteorological forecasting by fusing ground-station numerical data with time-series satellite imagery. It introduces a dual-branch architecture consisting of a Numerical Graph Convolutional Network for adaptive spatial-temporal numerical modeling and a Vision-LSTM with a Multi-Scale Channel-Spatial module to extract satellite-based visual features, connected via a Double Query Attention Module before a GCN-based decoder. The approach achieves state-of-the-art results on Weather2K across multiple regions and meteorological factors, with ablations showing the value of time embeddings, MSCSM vision processing, and learnable cross-modal queries. The work also provides interpretation metrics that quantify the contribution of meteorological factors and static location information, demonstrating the tangible impact of incorporating vision data on forecast accuracy and understanding.
Abstract
Sparse meteorological forecasting is indispensable for fine-grained weather forecasting and deserves extensive attention. Recent studies have highlighted the potential of spatio-temporal graph convolutional networks (ST-GCNs) in predicting numerical data from ground weather stations. However, as one of the highest fidelity and lowest latency data, the application of the vision data from satellites in ST-GCNs remains unexplored. There are few studies to demonstrate the effectiveness of combining the above multi-modal data for sparse meteorological forecasting. Towards this objective, we introduce Vision-Numerical Fusion Graph Convolutional Network (VN-Net), which mainly utilizes: 1) Numerical-GCN (N-GCN) to adaptively model the static and dynamic patterns of spatio-temporal numerical data; 2) Vision-LSTM Network (V-LSTM) to capture multi-scale joint channel and spatial features from time series satellite images; 4) a GCN-based decoder to generate hourly predictions of specified meteorological factors. As far as we know, VN-Net is the first attempt to introduce GCN method to utilize multi-modal data for better handling sparse spatio-temporal meteorological forecasting. Our experiments on Weather2k dataset show VN-Net outperforms state-of-the-art by a significant margin on mean absolute error (MAE) and root mean square error (RMSE) for temperature, relative humidity, and visibility forecasting. Furthermore, we conduct interpretation analysis and design quantitative evaluation metrics to assess the impact of incorporating vision data.
