Table of Contents
Fetching ...

Robustness investigation of cross-validation based quality measures for model assessment

Thomas Most, Lars Gräning, Sebastian Wolff

TL;DR

The accuracy and robustness of quality measures for the assessment of machine learning models are investigated and local quality measures are derived from the prediction residuals obtained by the cross-validation approach.

Abstract

In this paper the accuracy and robustness of quality measures for the assessment of machine learning models are investigated. The prediction quality of a machine learning model is evaluated model-independent based on a cross-validation approach, where the approximation error is estimated for unknown data. The presented measures quantify the amount of explained variation in the model prediction. The reliability of these measures is assessed by means of several numerical examples, where an additional data set for the verification of the estimated prediction error is available. Furthermore, the confidence bounds of the presented quality measures are estimated and local quality measures are derived from the prediction residuals obtained by the cross-validation approach.

Robustness investigation of cross-validation based quality measures for model assessment

TL;DR

The accuracy and robustness of quality measures for the assessment of machine learning models are investigated and local quality measures are derived from the prediction residuals obtained by the cross-validation approach.

Abstract

In this paper the accuracy and robustness of quality measures for the assessment of machine learning models are investigated. The prediction quality of a machine learning model is evaluated model-independent based on a cross-validation approach, where the approximation error is estimated for unknown data. The presented measures quantify the amount of explained variation in the model prediction. The reliability of these measures is assessed by means of several numerical examples, where an additional data set for the verification of the estimated prediction error is available. Furthermore, the confidence bounds of the presented quality measures are estimated and local quality measures are derived from the prediction residuals obtained by the cross-validation approach.
Paper Structure (15 sections, 30 equations, 23 figures, 2 tables)

This paper contains 15 sections, 30 equations, 23 figures, 2 tables.

Figures (23)

  • Figure 1: Approximation of noisy data points of a one-dimensional quadratic function with a polynomial model with increasing order
  • Figure 2: Basic cross-validation procedure by splitting the data set in two subsets: Using set one for training and set two for prediction (left) and set two for training and set one for prediction (right)
  • Figure 3: Moving Least Squares approximation of noisy data points of a non-linear function for different values of the influence radius and corresponding difference between CoD and CoP
  • Figure 4: Residual plot with the fitting and prediction residuals (left) and sample CoP, which quantifies the contribution of each sample to the CoP (right). One possible outlier is indicated in red which is outside the range of $\pm 3\times RMSE^{cv}$.
  • Figure 5: Estimated local root mean squared error (left) and the local Coefficient of Prognosis (right) as subspace plot in a 5D input space.
  • ...and 18 more figures