Predictive Uncertainty in Short-Term PV Forecasting under Missing Data: A Multiple Imputation Approach

Parastoo Pashmchi; Jérôme Benoit; Motonobu Kanagawa

Predictive Uncertainty in Short-Term PV Forecasting under Missing Data: A Multiple Imputation Approach

Parastoo Pashmchi, Jérôme Benoit, Motonobu Kanagawa

Abstract

Missing values are common in photovoltaic (PV) power data, yet the uncertainty they induce is not propagated into predictive distributions. We develop a framework that incorporates missing-data uncertainty into short-term PV forecasting by combining stochastic multiple imputation with Rubin's rule. The approach is model-agnostic and can be integrated with standard machine-learning predictors. Empirical results show that ignoring missing-data uncertainty leads to overly narrow prediction intervals. Accounting for this uncertainty improves interval calibration while maintaining comparable point prediction accuracy. These results demonstrate the importance of propagating imputation uncertainty in data-driven PV forecasting.

Predictive Uncertainty in Short-Term PV Forecasting under Missing Data: A Multiple Imputation Approach

Abstract

Paper Structure (23 sections, 35 equations, 7 figures, 3 tables)

This paper contains 23 sections, 35 equations, 7 figures, 3 tables.

INTRODUCTION
Existing Works on Missing Values in PV Systems
Contributions
Forecast Setup without Missing Values
One-hour-ahead PV Power Forecasting
Machine Learning Training
Machine Learning Forecast on Test Data
Multiple Imputation for PV Forecasting
Stochastic Imputation of Missing PV Power Values
Multiple Imputation Framework
Imputing Missing Training Data
Imputing Missing Test Input Features
Aggregation by Rubin's Rule
Predictive Intervals
Normal-based Intervals
...and 8 more sections

Figures (7)

Figure 1: Example of missing observations in real PV power measurements collected at EURECOM, illustrating complete-day gaps and prolonged zero-output periods consistent with system outages or failures.
Figure 2: Comparison of single and multiple imputation for one-hour-ahead prediction with a Random Forest model. The shaded regions show the 95% prediction intervals. Single imputation gives overly narrow intervals, whereas the proposed multiple-imputation approach accounts for missing-value uncertainty and gives wider intervals.
Figure 3: Dataset from the EU GRIDouble project used in our experiments. Top: original hourly DC power. Middle: DC power after block-wise removal of several contiguous weeks (29.5% missing). Bottom: corresponding hourly irradiation (GHI).
Figure 4: 95% prediction intervals ($B=5$) for Random Forest under (a) gamma and (b) normal predictive distributions, shown for three imputation setups: (1) SI train & test, (2) SI train with MI test, and (3) MI train & test. Black thick curve: ground truth; thick colored curve: predictive means; shaded region: 95% prediction intervals.
Figure 5: 95% prediction intervals ($B=5$) for MLP under (a) gamma and (b) normal predictive distributions, shown for three imputation setups: (1) SI train & test, (2) SI train with MI test, and (3) MI train & test. Black thick curve: ground truth; thick colored curve: predictive means; shaded region: 95% prediction intervals.
...and 2 more figures

Predictive Uncertainty in Short-Term PV Forecasting under Missing Data: A Multiple Imputation Approach

Abstract

Predictive Uncertainty in Short-Term PV Forecasting under Missing Data: A Multiple Imputation Approach

Authors

Abstract

Table of Contents

Figures (7)