VN-Net: Vision-Numerical Fusion Graph Convolutional Network for Sparse Spatio-Temporal Meteorological Forecasting

Yutong Xiong; Xun Zhu; Ming Wu; Weiqing Li; Fanbin Mo; Chuang Zhang; Bin Zhang

VN-Net: Vision-Numerical Fusion Graph Convolutional Network for Sparse Spatio-Temporal Meteorological Forecasting

Yutong Xiong, Xun Zhu, Ming Wu, Weiqing Li, Fanbin Mo, Chuang Zhang, Bin Zhang

TL;DR

VN-Net addresses sparse spatio-temporal meteorological forecasting by fusing ground-station numerical data with time-series satellite imagery. It introduces a dual-branch architecture consisting of a Numerical Graph Convolutional Network for adaptive spatial-temporal numerical modeling and a Vision-LSTM with a Multi-Scale Channel-Spatial module to extract satellite-based visual features, connected via a Double Query Attention Module before a GCN-based decoder. The approach achieves state-of-the-art results on Weather2K across multiple regions and meteorological factors, with ablations showing the value of time embeddings, MSCSM vision processing, and learnable cross-modal queries. The work also provides interpretation metrics that quantify the contribution of meteorological factors and static location information, demonstrating the tangible impact of incorporating vision data on forecast accuracy and understanding.

Abstract

Sparse meteorological forecasting is indispensable for fine-grained weather forecasting and deserves extensive attention. Recent studies have highlighted the potential of spatio-temporal graph convolutional networks (ST-GCNs) in predicting numerical data from ground weather stations. However, as one of the highest fidelity and lowest latency data, the application of the vision data from satellites in ST-GCNs remains unexplored. There are few studies to demonstrate the effectiveness of combining the above multi-modal data for sparse meteorological forecasting. Towards this objective, we introduce Vision-Numerical Fusion Graph Convolutional Network (VN-Net), which mainly utilizes: 1) Numerical-GCN (N-GCN) to adaptively model the static and dynamic patterns of spatio-temporal numerical data; 2) Vision-LSTM Network (V-LSTM) to capture multi-scale joint channel and spatial features from time series satellite images; 4) a GCN-based decoder to generate hourly predictions of specified meteorological factors. As far as we know, VN-Net is the first attempt to introduce GCN method to utilize multi-modal data for better handling sparse spatio-temporal meteorological forecasting. Our experiments on Weather2k dataset show VN-Net outperforms state-of-the-art by a significant margin on mean absolute error (MAE) and root mean square error (RMSE) for temperature, relative humidity, and visibility forecasting. Furthermore, we conduct interpretation analysis and design quantitative evaluation metrics to assess the impact of incorporating vision data.

VN-Net: Vision-Numerical Fusion Graph Convolutional Network for Sparse Spatio-Temporal Meteorological Forecasting

TL;DR

Abstract

Paper Structure (37 sections, 13 equations, 5 figures, 7 tables)

This paper contains 37 sections, 13 equations, 5 figures, 7 tables.

Introduction
Related Work
Sparse Meteorological Forecasting
Multi-modal Meteorological Forecasting
Method
Task Setting
Numerical Graph Convolutional Network
Spatial Dependency Modeling
Embedding Learning
Static Graph Learning
Dynamic Graph Learning
Node Adaptive Parameter Learning
Temporal Dependency Modeling
Vision-LSTM Network
Vision-Numerical Fusion
...and 22 more sections

Figures (5)

Figure 1: The architecture and key modules of VN-Net. (a) Overall design. (b) SDGRU. (c) V-LSTM. (d) MSCSM.
Figure 2: Three regions of the datasets. Northeast (38.3°N-44.7°N, 117.3°E-123.7°E), Southwest (26.3°N-32.7°N, 100.3°E-106.7°E), Southeast (26.8°N-33.2°N, 116.8°E-123.2°E) with 60, 96, and 139 ground weather stations, respectively.
Figure 3: Factor contribution results of the SW region. (a) Temperature, uni-modal, (b) Relative Humidity, uni-modal. (c) Visibility, uni-modal. (d) Temperature, multi-modal, (e) Relative Humidity, multi-modal. (f) Visibility, multi-modal. Factor names follow Weather2K notation. The legend label -$i$D and Static define the data contribution of the $i$ days before forecast initialization and 3 geolocation information constants, respectively.
Figure 4: Factor contribution results of the NE region. (a) Temperature, uni-modal, (b) Relative Humidity, uni-modal. (c) Visibility, uni-modal. (d) Temperature, multi-modal, (e) Relative Humidity, multi-modal. (f) Visibility, multi-modal.
Figure 5: Factor contribution results of the SE region. (a) Temperature, uni-modal, (b) Relative Humidity, uni-modal. (c) Visibility, uni-modal. (d) Temperature, multi-modal, (e) Relative Humidity, multi-modal. (f) Visibility, multi-modal.

VN-Net: Vision-Numerical Fusion Graph Convolutional Network for Sparse Spatio-Temporal Meteorological Forecasting

TL;DR

Abstract

VN-Net: Vision-Numerical Fusion Graph Convolutional Network for Sparse Spatio-Temporal Meteorological Forecasting

Authors

TL;DR

Abstract

Table of Contents

Figures (5)