Table of Contents
Fetching ...

Comparative Evaluation of Applicability Domain Definition Methods for Regression Models

Shakir Khurshid, Bharath Kumar Loganathan, Matthieu Duvinage

TL;DR

This work proposes a novel approach based on non-deterministic Bayesian neural networks to define the applicability domain of the model, and exhibited superior accuracy in defining the Applicability Domain compared to previous methods.

Abstract

The applicability domain refers to the range of data for which the prediction of the predictive model is expected to be reliable and accurate and using a model outside its applicability domain can lead to incorrect results. The ability to define the regions in data space where a predictive model can be safely used is a necessary condition for having safer and more reliable predictions to assure the reliability of new predictions. However, defining the applicability domain of a model is a challenging problem, as there is no clear and universal definition or metric for it. This work aims to make the applicability domain more quantifiable and pragmatic. Eight applicability domain detection techniques were applied to seven regression models, trained on five different datasets, and their performance was benchmarked using a validation framework. We also propose a novel approach based on non-deterministic Bayesian neural networks to define the applicability domain of the model. Our method exhibited superior accuracy in defining the Applicability Domain compared to previous methods, highlighting its potential in this regard.

Comparative Evaluation of Applicability Domain Definition Methods for Regression Models

TL;DR

This work proposes a novel approach based on non-deterministic Bayesian neural networks to define the applicability domain of the model, and exhibited superior accuracy in defining the Applicability Domain compared to previous methods.

Abstract

The applicability domain refers to the range of data for which the prediction of the predictive model is expected to be reliable and accurate and using a model outside its applicability domain can lead to incorrect results. The ability to define the regions in data space where a predictive model can be safely used is a necessary condition for having safer and more reliable predictions to assure the reliability of new predictions. However, defining the applicability domain of a model is a challenging problem, as there is no clear and universal definition or metric for it. This work aims to make the applicability domain more quantifiable and pragmatic. Eight applicability domain detection techniques were applied to seven regression models, trained on five different datasets, and their performance was benchmarked using a validation framework. We also propose a novel approach based on non-deterministic Bayesian neural networks to define the applicability domain of the model. Our method exhibited superior accuracy in defining the Applicability Domain compared to previous methods, highlighting its potential in this regard.

Paper Structure

This paper contains 27 sections, 16 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: An illustrative example Reference1 for the applicability domain problem. Within the green region, a linear model (represented by the red line) provides a good approximation of the data. However, outside the green region, the linear model's approximation is not valid. As a result, the applicability domain of the linear model is defined by the interval [-1, 1], which corresponds to the green region.
  • Figure 2: Evaluating BNNs on a test set for n-iterations. A high variance in output indicates low confidence, while a low variance implies higher confidence. This figure is inspired by the works of Sabber2019.
  • Figure 3: The figure depicts our study's workflow. We start by training the regression model and evaluating it on the test set. Then we apply the AD measure to get the AD Values. The AD values along with the test error are used to assess the performance of the AD measure. We expected to observe a monotonic increasing trend between the AD values and the test error
  • Figure 4: An example of the coverage plot is illustrated, where the y-axis represents the cumulative error of predictions, and the x-axis depicts AD values on a percentage scale. We set the threshold at the 25th percentile of cumulative test errors. The coverage of the AD measure is measured by the percentage of data falling within this threshold. Notice the "red" AD measure outperforms the "green" since it covers roughly 60% of the test set's data, compared to 35% for the green.
  • Figure 5: The area-under-curve (AUC) criterion corresponds to the filled area between the moving average plot and the average error of the model (the red horizontal line).
  • ...and 1 more figures