Data Requirements and Prediction Scaling for Long-Term Failure Forecasts in Wind Turbines
Viktor Begun, Ulrich Schlickewei
TL;DR
This paper addresses the challenge of long-term failure forecasting in wind turbines by introducing turbine-years ($TY$) as a standardized metric for dataset size and examining how forecast horizons scale with data. Through a literature survey using Google Scholar and Scopus, it characterizes predictions by dataset size, methods, and sensors (SCADA versus vibration), finding an approximate linear relationship where the forecast horizon scales as $2×TY$, with a practical minimum of $0.4 TY$ for ≥2-day forecasts and a transition around $TY≈10$ from specialized methods to ML/statistical approaches. It also shows that vibration data provide substantial horizon gains for small datasets, while large TY enables SCADA-based ML approaches to achieve comparable horizons. The work defines data-driven benchmarks for what constitutes “big data” and “long-term” forecasts in wind turbines and highlights the need for more standardized comparisons across methods and datasets to advance the field.
Abstract
We investigate the key factors that enable early failure forecasting in wind turbines. For this purpose, we analyze studies with long-term forecasts and compare their main features: prediction time, methods, targeted components, dataset size, and check the effect of using additional sensors. We found that the size of the dataset is the main factor and that an approximate linear scaling holds: the number of forecast days is twice the size of the dataset, measured in turbine years. We also observe that the data allow us to quantify the meaning of "big" and "long" in the terms "big data" and "long-term" forecasts, which are found to be ten turbine years and two weeks.
