Table of Contents
Fetching ...

Predictive Uncertainty in Short-Term PV Forecasting under Missing Data: A Multiple Imputation Approach

Parastoo Pashmchi, Jérôme Benoit, Motonobu Kanagawa

Abstract

Missing values are common in photovoltaic (PV) power data, yet the uncertainty they induce is not propagated into predictive distributions. We develop a framework that incorporates missing-data uncertainty into short-term PV forecasting by combining stochastic multiple imputation with Rubin's rule. The approach is model-agnostic and can be integrated with standard machine-learning predictors. Empirical results show that ignoring missing-data uncertainty leads to overly narrow prediction intervals. Accounting for this uncertainty improves interval calibration while maintaining comparable point prediction accuracy. These results demonstrate the importance of propagating imputation uncertainty in data-driven PV forecasting.

Predictive Uncertainty in Short-Term PV Forecasting under Missing Data: A Multiple Imputation Approach

Abstract

Missing values are common in photovoltaic (PV) power data, yet the uncertainty they induce is not propagated into predictive distributions. We develop a framework that incorporates missing-data uncertainty into short-term PV forecasting by combining stochastic multiple imputation with Rubin's rule. The approach is model-agnostic and can be integrated with standard machine-learning predictors. Empirical results show that ignoring missing-data uncertainty leads to overly narrow prediction intervals. Accounting for this uncertainty improves interval calibration while maintaining comparable point prediction accuracy. These results demonstrate the importance of propagating imputation uncertainty in data-driven PV forecasting.
Paper Structure (23 sections, 35 equations, 7 figures, 3 tables)

This paper contains 23 sections, 35 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Example of missing observations in real PV power measurements collected at EURECOM, illustrating complete-day gaps and prolonged zero-output periods consistent with system outages or failures.
  • Figure 2: Comparison of single and multiple imputation for one-hour-ahead prediction with a Random Forest model. The shaded regions show the 95% prediction intervals. Single imputation gives overly narrow intervals, whereas the proposed multiple-imputation approach accounts for missing-value uncertainty and gives wider intervals.
  • Figure 3: Dataset from the EU GRIDouble project used in our experiments. Top: original hourly DC power. Middle: DC power after block-wise removal of several contiguous weeks (29.5% missing). Bottom: corresponding hourly irradiation (GHI).
  • Figure 4: 95% prediction intervals ($B=5$) for Random Forest under (a) gamma and (b) normal predictive distributions, shown for three imputation setups: (1) SI train & test, (2) SI train with MI test, and (3) MI train & test. Black thick curve: ground truth; thick colored curve: predictive means; shaded region: 95% prediction intervals.
  • Figure 5: 95% prediction intervals ($B=5$) for MLP under (a) gamma and (b) normal predictive distributions, shown for three imputation setups: (1) SI train & test, (2) SI train with MI test, and (3) MI train & test. Black thick curve: ground truth; thick colored curve: predictive means; shaded region: 95% prediction intervals.
  • ...and 2 more figures