Table of Contents
Fetching ...

Validation of ML-UQ calibration statistics using simulated reference values: a sensitivity analysis

Pascal Pernot

Abstract

Some popular Machine Learning Uncertainty Quantification (ML-UQ) calibration statistics do not have predefined reference values and are mostly used in comparative studies. In consequence, calibration is almost never validated and the diagnostic is left to the appreciation of the reader. Simulated reference values, based on synthetic calibrated datasets derived from actual uncertainties, have been proposed to palliate this problem. As the generative probability distribution for the simulation of synthetic errors is often not constrained, the sensitivity of simulated reference values to the choice of generative distribution might be problematic, shedding a doubt on the calibration diagnostic. This study explores various facets of this problem, and shows that some statistics are excessively sensitive to the choice of generative distribution to be used for validation when the generative distribution is unknown. This is the case, for instance, of the correlation coefficient between absolute errors and uncertainties (CC) and of the expected normalized calibration error (ENCE). A robust validation workflow to deal with simulated reference values is proposed.

Validation of ML-UQ calibration statistics using simulated reference values: a sensitivity analysis

Abstract

Some popular Machine Learning Uncertainty Quantification (ML-UQ) calibration statistics do not have predefined reference values and are mostly used in comparative studies. In consequence, calibration is almost never validated and the diagnostic is left to the appreciation of the reader. Simulated reference values, based on synthetic calibrated datasets derived from actual uncertainties, have been proposed to palliate this problem. As the generative probability distribution for the simulation of synthetic errors is often not constrained, the sensitivity of simulated reference values to the choice of generative distribution might be problematic, shedding a doubt on the calibration diagnostic. This study explores various facets of this problem, and shows that some statistics are excessively sensitive to the choice of generative distribution to be used for validation when the generative distribution is unknown. This is the case, for instance, of the correlation coefficient between absolute errors and uncertainties (CC) and of the expected normalized calibration error (ENCE). A robust validation workflow to deal with simulated reference values is proposed.
Paper Structure (24 sections, 28 equations, 10 figures, 6 tables)

This paper contains 24 sections, 28 equations, 10 figures, 6 tables.

Figures (10)

  • Figure 1: Flowchart for the validation of a statistic $\vartheta$. $\vartheta_{ref}$ is the reference value used for validation, $\vartheta_{est}$ is the actual value of the statistic, $I_{BS}$ is the bootstrapped CI for $\vartheta_{est}$ and $D$ is the error generative distribution.
  • Figure 2: Z-scores distributions (histograms) with normal (red line) and scaled and shifted Student's-t (blue line) fits. For legibility, the histograms have been truncated to $\pm3$ standard deviations, hiding a few outlying values.
  • Figure 3: Recovery of the error distributions with the generative model, using a normal distribution (red line) or a unit-variance Student's distribution $t_{s}(\nu_{Z})$ ($\nu_{Z}$ from Table \ref{['tab:data-summary']}; blue line). For Set 6, the Student's-t model cannot be generated because of the infinite variance for $\nu_{Z}<2$ and it was replaced by $\nu_{Z}=2.1$.
  • Figure 4: Sensitivity of $\tilde{\vartheta}_{D,ref}$ to the dataset for the ZMS (a), CC (b), ENCE (c, d) and ZMSE (e, f) statistics. The generative distribution is standard normal, $D=N(0,1)$.
  • Figure 5: Sensitivity of the simulated reference values to the generative distribution. Calibrated datasets are generated from the actual uncertainties of Sets 7 and 8, using a generative distribution $D=t_{s}(\nu)$ with $\nu$ degrees of freedom.
  • ...and 5 more figures