Table of Contents
Fetching ...

Data Requirements and Prediction Scaling for Long-Term Failure Forecasts in Wind Turbines

Viktor Begun, Ulrich Schlickewei

TL;DR

This paper addresses the challenge of long-term failure forecasting in wind turbines by introducing turbine-years ($TY$) as a standardized metric for dataset size and examining how forecast horizons scale with data. Through a literature survey using Google Scholar and Scopus, it characterizes predictions by dataset size, methods, and sensors (SCADA versus vibration), finding an approximate linear relationship where the forecast horizon scales as $2×TY$, with a practical minimum of $0.4 TY$ for ≥2-day forecasts and a transition around $TY≈10$ from specialized methods to ML/statistical approaches. It also shows that vibration data provide substantial horizon gains for small datasets, while large TY enables SCADA-based ML approaches to achieve comparable horizons. The work defines data-driven benchmarks for what constitutes “big data” and “long-term” forecasts in wind turbines and highlights the need for more standardized comparisons across methods and datasets to advance the field.

Abstract

We investigate the key factors that enable early failure forecasting in wind turbines. For this purpose, we analyze studies with long-term forecasts and compare their main features: prediction time, methods, targeted components, dataset size, and check the effect of using additional sensors. We found that the size of the dataset is the main factor and that an approximate linear scaling holds: the number of forecast days is twice the size of the dataset, measured in turbine years. We also observe that the data allow us to quantify the meaning of "big" and "long" in the terms "big data" and "long-term" forecasts, which are found to be ten turbine years and two weeks.

Data Requirements and Prediction Scaling for Long-Term Failure Forecasts in Wind Turbines

TL;DR

This paper addresses the challenge of long-term failure forecasting in wind turbines by introducing turbine-years () as a standardized metric for dataset size and examining how forecast horizons scale with data. Through a literature survey using Google Scholar and Scopus, it characterizes predictions by dataset size, methods, and sensors (SCADA versus vibration), finding an approximate linear relationship where the forecast horizon scales as , with a practical minimum of for ≥2-day forecasts and a transition around from specialized methods to ML/statistical approaches. It also shows that vibration data provide substantial horizon gains for small datasets, while large TY enables SCADA-based ML approaches to achieve comparable horizons. The work defines data-driven benchmarks for what constitutes “big data” and “long-term” forecasts in wind turbines and highlights the need for more standardized comparisons across methods and datasets to advance the field.

Abstract

We investigate the key factors that enable early failure forecasting in wind turbines. For this purpose, we analyze studies with long-term forecasts and compare their main features: prediction time, methods, targeted components, dataset size, and check the effect of using additional sensors. We found that the size of the dataset is the main factor and that an approximate linear scaling holds: the number of forecast days is twice the size of the dataset, measured in turbine years. We also observe that the data allow us to quantify the meaning of "big" and "long" in the terms "big data" and "long-term" forecasts, which are found to be ten turbine years and two weeks.
Paper Structure (4 sections, 1 figure, 2 tables)

This paper contains 4 sections, 1 figure, 2 tables.

Figures (1)

  • Figure 1: Prediction time as a function of dataset size measured in turbine years. The dotted lines indicate the forecast boundaries assuming saturation, while the dashed line shows linear estimation -- prediction days equal the number of turbine years multiplied by two. Labels denote four groups: predictions for small datasets with TY$<$10 (Specific), large datasets with TY$>10$ (ML/Statistic), normal behavior models (NBM), and datasets with both SCADA and vibration measures (Vibration).