FusionSF: Fuse Heterogeneous Modalities in a Vector Quantized Framework for Robust Solar Power Forecasting

Ziqing Ma; Wenwei Wang; Tian Zhou; Chao Chen; Bingqing Peng; Liang Sun; Rong Jin

FusionSF: Fuse Heterogeneous Modalities in a Vector Quantized Framework for Robust Solar Power Forecasting

Ziqing Ma, Wenwei Wang, Tian Zhou, Chao Chen, Bingqing Peng, Liang Sun, Rong Jin

TL;DR

The paper addresses the challenge of accurate day-ahead solar power forecasting for data-scarce plants by introducing FusionSF, a vector-quantized, multi-modal Transformer framework that fuses historical power data, satellite imagery, and future NWP. It leverages Rotary Positional Encoding, patching, and residual Vector Quantization, with a Cross Transformer to integrate modalities and a decoder to generate next-step predictions. The authors release the MMSP dataset, demonstrate strong zero-shot performance, and achieve real-world impact by deploying FusionSF across >300 plants totaling over $15$ GW via the eForecaster platform, showing notable improvements over state-of-the-art baselines. This work highlights the value of aligning heterogeneous data sources in the solar forecasting domain, enabling more robust grid integration and potential cost savings through improved forecast accuracy.

Abstract

Accurate solar power forecasting is crucial to integrate photovoltaic plants into the electric grid, schedule and secure the power grid safety. This problem becomes more demanding for those newly installed solar plants which lack sufficient data. Current research predominantly relies on historical solar power data or numerical weather prediction in a single-modality format, ignoring the complementary information provided in different modalities. In this paper, we propose a multi-modality fusion framework to integrate historical power data, numerical weather prediction, and satellite images, significantly improving forecast performance. We introduce a vector quantized framework that aligns modalities with varying information densities, striking a balance between integrating sufficient information and averting model overfitting. Our framework demonstrates strong zero-shot forecasting capability, which is especially useful for those newly installed plants. Moreover, we collect and release a multi-modal solar power (MMSP) dataset from real-world plants to further promote the research of multi-modal solar forecasting algorithms. Our extensive experiments show that our model not only operates with robustness but also boosts accuracy in both zero-shot forecasting and scenarios rich with training data, surpassing leading models. We have incorporated it into our eForecaster platform and deployed it for more than 300 solar plants with a capacity of over 15GW.

FusionSF: Fuse Heterogeneous Modalities in a Vector Quantized Framework for Robust Solar Power Forecasting

TL;DR

GW via the eForecaster platform, showing notable improvements over state-of-the-art baselines. This work highlights the value of aligning heterogeneous data sources in the solar forecasting domain, enabling more robust grid integration and potential cost savings through improved forecast accuracy.

Abstract

Paper Structure (38 sections, 10 equations, 12 figures, 6 tables, 1 algorithm)

This paper contains 38 sections, 10 equations, 12 figures, 6 tables, 1 algorithm.

Introduction
Related work
Deep networks for time series and spatiotemporal forecasting
Multi-modal solar forecasting
Zero-shot learning for time series
Methodology
FusionSF overall architecture
Feature encoding
Rotary Positional encoding
Patching & Masking
Vector quantization (VQ)
Transformer-based Encoder
Modality mixing
Benchmark Dataset
Historical time series modality
...and 23 more sections

Figures (12)

Figure 1: Left: An illustration of our proposed multi-modal framework. The three modalities include solar power historical data, satellite images, and NWP data. Right: Geographical locations of the 88 solar power plants and the zero-shot learning setting. The plants are grouped into sets of 10 and are represented in different colors.
Figure 2: FusionSF architecture. The contextual images are tokenized, randomly masked, vector quantized, and processed with Vision Transformer. The vector quantized solar power TS and NWP covariates are processed with Temporal Transformer. The three modalities are fused with Cross Transformer. In the decoder, the mixed latent representation is processed with Temporal Transformer to make the final output.
Figure 3: Prediction visualization from FusionSF and other baselines. The first row shows two 'Hard' cases and the second row shows two 'Easy' cases.
Figure 4: Radar plots for analyzing the model performance for zero-shot learning. The test set includes data from solar plants #0 to #9 and the training set varies. The metrics are rescaled for visual clarity. A larger radar plot indicates better performance.
Figure 5: Comparison of latent value distributions employing VQ for satellite images and TS data. The upper panels display the distributions with and without VQ for satellite images (Upper Left) and TS (Upper Right). Lower panels illustrate t-SNE visualizations of latent values corresponding to satellite images with (Lower Left) and without (Lower Right) VQ. Distinct colors denote disparate tokens.
...and 7 more figures

FusionSF: Fuse Heterogeneous Modalities in a Vector Quantized Framework for Robust Solar Power Forecasting

TL;DR

Abstract

FusionSF: Fuse Heterogeneous Modalities in a Vector Quantized Framework for Robust Solar Power Forecasting

Authors

TL;DR

Abstract

Table of Contents

Figures (12)