Table of Contents
Fetching ...

Evaluation of Bagging Predictors with Kernel Density Estimation and Bagging Score

Philipp Seitz, Jan Schmitt, Andreas Schiffler

Abstract

For a larger set of predictions of several differently trained machine learning models, known as bagging predictors, the mean of all predictions is taken by default. Nevertheless, this proceeding can deviate from the actual ground truth in certain parameter regions. An approach is presented to determine a representative y_BS from such a set of predictions using Kernel Density Estimation (KDE) in nonlinear regression with Neural Networks (NN) which simultaneously provides an associated quality criterion beta_BS, called Bagging Score (BS), that reflects the confidence of the obtained ensemble prediction. It is shown that working with the new approach better predictions can be made than working with the common use of mean or median. In addition to this, the used method is contrasted to several approaches of nonlinear regression from the literatur, resulting in a top ranking in each of the calculated error values without using any optimization or feature selection technique.

Evaluation of Bagging Predictors with Kernel Density Estimation and Bagging Score

Abstract

For a larger set of predictions of several differently trained machine learning models, known as bagging predictors, the mean of all predictions is taken by default. Nevertheless, this proceeding can deviate from the actual ground truth in certain parameter regions. An approach is presented to determine a representative y_BS from such a set of predictions using Kernel Density Estimation (KDE) in nonlinear regression with Neural Networks (NN) which simultaneously provides an associated quality criterion beta_BS, called Bagging Score (BS), that reflects the confidence of the obtained ensemble prediction. It is shown that working with the new approach better predictions can be made than working with the common use of mean or median. In addition to this, the used method is contrasted to several approaches of nonlinear regression from the literatur, resulting in a top ranking in each of the calculated error values without using any optimization or feature selection technique.

Paper Structure

This paper contains 8 sections, 4 equations, 2 figures, 2 tables, 1 algorithm.

Figures (2)

  • Figure 1: Example for expected and unexpected usual asymmetric cumulative distribution when evaluating bagging predictors where mean and median are likely to deviate from the actual ground truth.
  • Figure 2: Use case of determining the ensemble prediction and its Bagging Score by two examples of a synthetic function $f_{GT}(x)$. Top: Example for good normal distribution in prediction set. Bottom: Example for shifted distribution.