Table of Contents
Fetching ...

Solar Multimodal Transformer: Intraday Solar Irradiance Predictor using Public Cameras and Time Series

Yanan Niu, Roy Sarkis, Demetri Psaltis, Mario Paolone, Christophe Moser, Luisa Lambertini

TL;DR

The paper addresses intraday solar irradiance forecasting in the 10-minute-to-hours range by introducing the Solar Multimodal Transformer (SMT), which fuses single-frame public camera imagery with historical GHI time series through an early-fusion transformer. A normalization step scales GHI by the daily maximum clear-sky value to emphasize sky clearness, improving forecast accuracy. SMT, including lightweight image-time-series integration and optional CNN/U-net hybrids, achieves a 25.95% RMSE reduction compared with Solcast over a 12-day test, and ablation/attention analyses provide insight into when and how each modality contributes. The work demonstrates strong practical potential for scalable, camera-agnostic solar forecasting with broad applicability in energy markets and grid planning.

Abstract

Accurate intraday solar irradiance forecasting is crucial for optimizing dispatch planning and electricity trading. For this purpose, we introduce a novel and effective approach that includes three distinguishing components from the literature: 1) the uncommon use of single-frame public camera imagery; 2) solar irradiance time series scaled with a proposed normalization step, which boosts performance; and 3) a lightweight multimodal model, called Solar Multimodal Transformer (SMT), that delivers accurate short-term solar irradiance forecasting by combining images and scaled time series. Benchmarking against Solcast, a leading solar forecasting service provider, our model improved prediction accuracy by 25.95%. Our approach allows for easy adaptation to various camera specifications, offering broad applicability for real-world solar forecasting challenges.

Solar Multimodal Transformer: Intraday Solar Irradiance Predictor using Public Cameras and Time Series

TL;DR

The paper addresses intraday solar irradiance forecasting in the 10-minute-to-hours range by introducing the Solar Multimodal Transformer (SMT), which fuses single-frame public camera imagery with historical GHI time series through an early-fusion transformer. A normalization step scales GHI by the daily maximum clear-sky value to emphasize sky clearness, improving forecast accuracy. SMT, including lightweight image-time-series integration and optional CNN/U-net hybrids, achieves a 25.95% RMSE reduction compared with Solcast over a 12-day test, and ablation/attention analyses provide insight into when and how each modality contributes. The work demonstrates strong practical potential for scalable, camera-agnostic solar forecasting with broad applicability in energy markets and grid planning.

Abstract

Accurate intraday solar irradiance forecasting is crucial for optimizing dispatch planning and electricity trading. For this purpose, we introduce a novel and effective approach that includes three distinguishing components from the literature: 1) the uncommon use of single-frame public camera imagery; 2) solar irradiance time series scaled with a proposed normalization step, which boosts performance; and 3) a lightweight multimodal model, called Solar Multimodal Transformer (SMT), that delivers accurate short-term solar irradiance forecasting by combining images and scaled time series. Benchmarking against Solcast, a leading solar forecasting service provider, our model improved prediction accuracy by 25.95%. Our approach allows for easy adaptation to various camera specifications, offering broad applicability for real-world solar forecasting challenges.

Paper Structure

This paper contains 11 sections, 3 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: Model overview of Solar Multimodal Transformer. Illustration inspired by Kim et al. Kim2021. This end-to-end, single-stream model uses a basic transformer encoder to integrate data from multiple modalities. It linearly projects patches from images and time series data (historical GHI and optional meteorological data), before feeding them into the transformer for information fusion. The model employs a typical class token [CLS] to extract crucial information for the final prediction.
  • Figure 2: SMT vs. Solcast, daily RMSE
  • Figure 3: Attention analysis using column patches for SMT: (a) Unprocessed images. (b) Patch-specific visualizations using attention weighted rollout. (c) Patch-specific visualizations from the last attention block. (d) Bar plots of weighted attention, quantifying the influence of different patches throughout the model’s layers. (e) Bar plots of attention from the last layer, illustrating the final focus before a forecast is made. The accentuated patches in panels (b, c) indicate areas of higher importance for the final prediction. To incorporate the contribution of time series along with column patches in these visualizations, panels (b, c) include an added column of pixels at the end. Similarly, the time series component is visualized as the last vector in panels (d, e), marked in red, to highlight its relative importance.