Table of Contents
Fetching ...

The temporal overfitting problem with applications in wind power curve modeling

Abhinav Prakash, Rui Tuo, Yu Ding

TL;DR

A Gaussian process (GP)-based method is proposed to tackle temporal overfitting, a nonparametric regression problem in which the input variables and the errors are autocorrelated in time.

Abstract

This paper is concerned with a nonparametric regression problem in which the input variables and the errors are autocorrelated in time. The motivation for the research stems from modeling wind power curves. Using existing model selection methods, like cross validation, results in model overfitting in presence of temporal autocorrelation. This phenomenon is referred to as temporal overfitting, which causes loss of performance while predicting responses for a time domain different from the training time domain. We propose a Gaussian process (GP)-based method to tackle the temporal overfitting problem. Our model is partitioned into two parts -- a time-invariant component and a time-varying component, each of which is modeled through a GP. We modify the inference method to a thinning-based strategy, an idea borrowed from Markov chain Monte Carlo sampling, to overcome temporal overfitting and estimate the time-invariant component. We extensively compare our proposed method with both existing power curve models and available ideas for handling temporal overfitting on real wind turbine datasets. Our approach yields significant improvement when predicting response for a time period different from the training time period. Supplementary material and computer code for this article is available online.

The temporal overfitting problem with applications in wind power curve modeling

TL;DR

A Gaussian process (GP)-based method is proposed to tackle temporal overfitting, a nonparametric regression problem in which the input variables and the errors are autocorrelated in time.

Abstract

This paper is concerned with a nonparametric regression problem in which the input variables and the errors are autocorrelated in time. The motivation for the research stems from modeling wind power curves. Using existing model selection methods, like cross validation, results in model overfitting in presence of temporal autocorrelation. This phenomenon is referred to as temporal overfitting, which causes loss of performance while predicting responses for a time domain different from the training time domain. We propose a Gaussian process (GP)-based method to tackle the temporal overfitting problem. Our model is partitioned into two parts -- a time-invariant component and a time-varying component, each of which is modeled through a GP. We modify the inference method to a thinning-based strategy, an idea borrowed from Markov chain Monte Carlo sampling, to overcome temporal overfitting and estimate the time-invariant component. We extensively compare our proposed method with both existing power curve models and available ideas for handling temporal overfitting on real wind turbine datasets. Our approach yields significant improvement when predicting response for a time period different from the training time period. Supplementary material and computer code for this article is available online.

Paper Structure

This paper contains 17 sections, 14 equations, 6 figures, 7 tables.

Figures (6)

  • Figure 1: A nominal wind power curve. Dots (red) denote the data; piecewise constant curve (blue) represents binning; smooth curve (black) is from smoothing on binning.
  • Figure 2: Effect of correlation between input variable and error on functional estimate: a) correlated errors; b) independent errors. We use $f(x) = 5x^2$. The correlated error sequence is generated using a zero mean GP with input $x$ and an exponential kernel with a lengthscale of 0.05.
  • Figure 3: A schematic of time-split cross-validation. Each block represents a group of temporally adjacent data points.
  • Figure 4: Relative RMSEs as compared to binning RMSE for out-of-temporal datasets. The top two plots are for dataset $\mathcal{T}_2$ with the top-left plot a) for kNN, AMK, tempGP, and regGP and the top-right plot b) for TS-kNN, CVc-kNN, PW-AMK, and tempGP. The bottom two plots are for dataset $\mathcal{T}_3$ with the bottom-left plot c) for kNN, AMK, tempGP, and regGP and the bottom-right plot d) or TS-kNN, CVc-kNN, PW-AMK, and tempGP.
  • Figure 5: Box plots for relative RMSE using different thinning numbers for all the turbines: a) for test set $\mathcal{T}_2$; b) for test set $\mathcal{T}_3$. "Adp" denotes the adaptive thinning number computed using the proposed approach.
  • ...and 1 more figures