Marginalize, Rather than Impute: Probabilistic Wind Power Forecasting with Incomplete Data
Honglin Wen, Pierre Pinson, Jie Gu, Zhijian Jin
TL;DR
Missing data in wind power forecasting can bias models and obscure uncertainty. The authors propose a flow-augmented VAE that learns the joint distribution of features and targets and marginalizes missing features during forecasting, avoiding imputation. Training uses an IWAE objective on observed data, and operational forecasts are generated via importance resampling to produce calibrated predictive scenarios. Empirical results on WIND Toolkit data show CRPS improvements and favorable calibration versus impute-then-predict baselines, with scalable training and efficient real-time forecasting.
Abstract
Machine learning methods are widely and successfully used for probabilistic wind power forecasting, yet the pervasive issue of missing values (e.g., due to sensor faults or communication outages) has received limited attention. The prevailing practice is impute-then-predict, but conditioning on point imputations biases parameter estimates and fails to propagate uncertainty from missing features. Our approach treats missing features and forecast targets uniformly: we learn a joint generative model of features and targets from incomplete data and, at operational deployment, condition on the observed features and marginalize the unobserved ones to produce forecasts. This imputation-free procedure avoids error introduced by imputation and preserves uncertainty aroused from missing features. In experiments, it improves forecast quality in terms of continuous ranked probability score relative to impute-then-predict baselines while incurring substantially lower computational cost than common alternatives.
