Table of Contents
Fetching ...

SolNet: Open-source deep learning models for photovoltaic power forecasting across the globe

Joris Depoortere, Johan Driesen, Johan Suykens, Hussain Syed Kazmi

TL;DR

SolNet tackles the data scarcity challenge in global PV power forecasting by pretraining a long short-term memory (LSTM) forecaster on abundant synthetic data from PVGIS and then fine-tuning with limited observational data. The study provides a detailed, open-source pipeline and evaluates transfer learning across 300+ sites in the Netherlands, Australia, and Belgium, revealing that transfer learning yields the largest gains when target data are scarce, and that weather covariates substantially boost performance. The work also investigates how seasonality and source-domain misspecification affect outcomes, offering practical guidelines for practitioners. By enabling zero-shot forecasting anywhere with available synthetic and forecast data, SolNet has the potential to greatly expand accurate solar forecasting globally, with a public Python package and thorough evaluation framework to support adoption.

Abstract

Deep learning models have gained increasing prominence in recent years in the field of solar pho-tovoltaic (PV) forecasting. One drawback of these models is that they require a lot of high-quality data to perform well. This is often infeasible in practice, due to poor measurement infrastructure in legacy systems and the rapid build-up of new solar systems across the world. This paper proposes SolNet: a novel, general-purpose, multivariate solar power forecaster, which addresses these challenges by using a two-step forecasting pipeline which incorporates transfer learning from abundant synthetic data generated from PVGIS, before fine-tuning on observational data. Using actual production data from hundreds of sites in the Netherlands, Australia and Belgium, we show that SolNet improves forecasting performance over data-scarce settings as well as baseline models. We find transfer learning benefits to be the strongest when only limited observational data is available. At the same time we provide several guidelines and considerations for transfer learning practitioners, as our results show that weather data, seasonal patterns, amount of synthetic data and possible mis-specification in source location, can have a major impact on the results. The SolNet models created in this way are applicable for any land-based solar photovoltaic system across the planet where simulated and observed data can be combined to obtain improved forecasting capabilities.

SolNet: Open-source deep learning models for photovoltaic power forecasting across the globe

TL;DR

SolNet tackles the data scarcity challenge in global PV power forecasting by pretraining a long short-term memory (LSTM) forecaster on abundant synthetic data from PVGIS and then fine-tuning with limited observational data. The study provides a detailed, open-source pipeline and evaluates transfer learning across 300+ sites in the Netherlands, Australia, and Belgium, revealing that transfer learning yields the largest gains when target data are scarce, and that weather covariates substantially boost performance. The work also investigates how seasonality and source-domain misspecification affect outcomes, offering practical guidelines for practitioners. By enabling zero-shot forecasting anywhere with available synthetic and forecast data, SolNet has the potential to greatly expand accurate solar forecasting globally, with a public Python package and thorough evaluation framework to support adoption.

Abstract

Deep learning models have gained increasing prominence in recent years in the field of solar pho-tovoltaic (PV) forecasting. One drawback of these models is that they require a lot of high-quality data to perform well. This is often infeasible in practice, due to poor measurement infrastructure in legacy systems and the rapid build-up of new solar systems across the world. This paper proposes SolNet: a novel, general-purpose, multivariate solar power forecaster, which addresses these challenges by using a two-step forecasting pipeline which incorporates transfer learning from abundant synthetic data generated from PVGIS, before fine-tuning on observational data. Using actual production data from hundreds of sites in the Netherlands, Australia and Belgium, we show that SolNet improves forecasting performance over data-scarce settings as well as baseline models. We find transfer learning benefits to be the strongest when only limited observational data is available. At the same time we provide several guidelines and considerations for transfer learning practitioners, as our results show that weather data, seasonal patterns, amount of synthetic data and possible mis-specification in source location, can have a major impact on the results. The SolNet models created in this way are applicable for any land-based solar photovoltaic system across the planet where simulated and observed data can be combined to obtain improved forecasting capabilities.
Paper Structure (26 sections, 4 equations, 8 figures, 1 table)

This paper contains 26 sections, 4 equations, 8 figures, 1 table.

Figures (8)

  • Figure 1: Complete assessment of transfer performance. The initial performance measures performance of a model without any training in the domain of interest (the target domain). The learning performance indicates the performance gain we can expect over time as more data to train on becomes available. Finally, the asymptotic performance is the 'permanent' performance differential between the models when sufficient data is available. The shape of the figure is indicative of our hypothesised findings.
  • Figure 2: The workflow of the experiments. In the transfer learning case we train in the (synthetic) source domain before finetuning in the target domain. In direct (i.e. target) learning, we train directly on the target domain. The benchmark is a persistence model and does not require any training. The periods on the left are specifically those of the NL case, with target data starting in 2020 and ending in August 2021.
  • Figure 3: Forecast error as a function of target data availability and use of transfer learning for univariate models. The thick line represents the median performance at each time step. The shaded areas represent the performance of all individual systems, with the outer edges being the worst/best performing systems. The Results indicate transfer learning models consistently outperform both naive baselines and target models (i.e. trained without transfer).
  • Figure 4: Forecast error as a function of target data availability and use of transfer learning for multivariate models using weather data. The target model is trained on weather forecast data, the transfer model is trained on actuals weather data and fine-tuned on weather forecast data. Results are consistent with the univariate models, but with a strong increase in overall performance.
  • Figure 5: Transfer learning results under increasing distance from the target for a subset of ten NL locations. The performance is compared to the median baseline of these ten locations, as well as the median 'correct' target model. I.e. a target model that is made with no misspecification.
  • ...and 3 more figures