Table of Contents
Fetching ...

Robustness Quantification and Uncertainty Quantification: Comparing Two Methods for Assessing the Reliability of Classifier Predictions

Adrián Detavernier, Jasper De Bock

Abstract

We consider two approaches for assessing the reliability of the individual predictions of a classifier: Robustness Quantification (RQ) and Uncertainty Quantification (UQ). We explain the conceptual differences between the two approaches, compare both approaches on a number of benchmark datasets and show that RQ is capable of outperforming UQ, both in a standard setting and in the presence of distribution shift. Beside showing that RQ can be competitive with UQ, we also demonstrate the complementarity of RQ and UQ by showing that a combination of both approaches can lead to even better reliability assessments.

Robustness Quantification and Uncertainty Quantification: Comparing Two Methods for Assessing the Reliability of Classifier Predictions

Abstract

We consider two approaches for assessing the reliability of the individual predictions of a classifier: Robustness Quantification (RQ) and Uncertainty Quantification (UQ). We explain the conceptual differences between the two approaches, compare both approaches on a number of benchmark datasets and show that RQ is capable of outperforming UQ, both in a standard setting and in the presence of distribution shift. Beside showing that RQ can be competitive with UQ, we also demonstrate the complementarity of RQ and UQ by showing that a combination of both approaches can lead to even better reliability assessments.
Paper Structure (11 sections, 10 equations, 3 figures, 6 tables)

This paper contains 11 sections, 10 equations, 3 figures, 6 tables.

Figures (3)

  • Figure 1: Situating robustness quantification and uncertainty quantification with respect to reliability quantification.
  • Figure 2: Each graph represents the mean of the ARCs for the NBC on the Student Performance Port (D14) dataset for a combination of training set size $N$ and feature noise $\beta$.
  • Figure 3: This point cloud (logarithmic scale) depicts for each instance if its predicted class was correct (green) or wrong (red). This plot was made for the dataset D4, performing classification with the NBC.