When to Accept Automated Predictions and When to Defer to Human Judgment?

Daniel Sikar; Artur Garcez; Tillman Weyde; Robin Bloomfield; Kaleem Peeroo

When to Accept Automated Predictions and When to Defer to Human Judgment?

Daniel Sikar, Artur Garcez, Tillman Weyde, Robin Bloomfield, Kaleem Peeroo

TL;DR

This paper tackles when automated predictions should be trusted under distribution shifts by introducing a centroid-distance confidence metric. Centroids are computed from the mean softmax outputs of correctly labeled training examples, and a safety threshold is defined from the distance of incorrect predictions to their class centroids. Empirical validation on MNIST with a CNN and CIFAR-10 with a Vision Transformer shows that closer-to-centroid predictions are typically correct, while misclassifications lie farther from centroids, with a very small testing overlap (~0.04%) between safe and unsafe predictions. The results support a practical, domain-specific threshold strategy to gate automated decisions and defer uncertain cases to humans, with avenues for extending the approach to autonomous-systems safety and alternative thresholding models.

Abstract

Ensuring the reliability and safety of automated decision-making is crucial. It is well-known that data distribution shifts in machine learning can produce unreliable outcomes. This paper proposes a new approach for measuring the reliability of predictions under distribution shifts. We analyze how the outputs of a trained neural network change using clustering to measure distances between outputs and class centroids. We propose this distance as a metric to evaluate the confidence of predictions under distribution shifts. We assign each prediction to a cluster with centroid representing the mean softmax output for all correct predictions of a given class. We then define a safety threshold for a class as the smallest distance from an incorrect prediction to the given class centroid. We evaluate the approach on the MNIST and CIFAR-10 datasets using a Convolutional Neural Network and a Vision Transformer, respectively. The results show that our approach is consistent across these data sets and network models, and indicate that the proposed metric can offer an efficient way of determining when automated predictions are acceptable and when they should be deferred to human operators given a distribution shift.

When to Accept Automated Predictions and When to Defer to Human Judgment?

TL;DR

Abstract

Paper Structure (4 sections, 9 figures, 3 tables, 2 algorithms)

This paper contains 4 sections, 9 figures, 3 tables, 2 algorithms.

Background
Clustering and Softmax Distance as a Confidence Metric
Experimental Results and Discussion
Conclusions and Future Work

Figures (9)

Figure 1: Please zoom in for detail. Confusion matrices for the MNIST classification model on the training dataset (left) and testing dataset (right). The matrices display the true labels on the vertical axis and the predicted labels on the horizontal axis. The diagonal elements represent correctly classified instances, while the off-diagonal elements indicate misclassifications.
Figure 2: Please zoom in for detail. Confusion matrices for the CIFAR-10 classification model on the training dataset (left) and testing dataset (right).
Figure 3: Please zoom in for detail. Expected accuracy linear fit based on prediction softmax distance to class centroid. MNIST fit is on the left, CIFAR-10 is on the right.
Figure 4: Distribution of Distances to Centroids for Correctly and Incorrectly Classified Instances in Training and Testing CIFAR-10 data, where training data is on the left and testing data is displayed on the right. Distances on y axis are shown on a logarithmic scale. Note, centroids are obtained from correctly classified training examples, then used for both training and testing datasets, a cluster is not created from the testing softmax distances.
Figure 5: Please zoom in for detail. Expected accuracy decrease as a result of threshold decrease. MNIST data is on the left, CIFAR-10 is on the right. The x axis shows the threshold decrement in factors of 0.1, that is, at 0.1 the threshold is 90% of the original threshold while at 0.9 the threshold is 10% of the original threshold and consequently neared to the class centroid.
...and 4 more figures

When to Accept Automated Predictions and When to Defer to Human Judgment?

TL;DR

Abstract

When to Accept Automated Predictions and When to Defer to Human Judgment?

Authors

TL;DR

Abstract

Table of Contents

Figures (9)