Table of Contents
Fetching ...

Explorations of the Softmax Space: Knowing When the Neural Network Doesn't Know

Daniel Sikar, Artur d'Avila Garcez, Tillman Weyde

TL;DR

This paper tackles the reliability of neural network decisions under distribution shift by introducing a centroid-based confidence metric in the softmax space. It clusters softmax output vectors, computes class centroids from correct predictions, and uses a distance to these centroids with a data-driven threshold to decide when to return a not-known response. The approach is evaluated on MNIST with a CNN and CIFAR-10 with a Vision Transformer, showing consistent clustering of correct predictions near centroids and increased unknown rejection for out-of-distribution-like data, with thresholding offering a tunable balance between retention and accuracy. The results suggest a practical, low-cost mechanism to signal low-confidence predictions and defer uncertain cases to human operators in safety-critical settings.

Abstract

Ensuring the reliability of automated decision-making based on neural networks will be crucial as Artificial Intelligence systems are deployed more widely in critical situations. This paper proposes a new approach for measuring confidence in the predictions of any neural network that relies on the predictions of a softmax layer. We identify that a high-accuracy trained network may have certain outputs for which there should be low confidence. In such cases, decisions should be deferred and it is more appropriate for the network to provide a \textit{not known} answer to a corresponding classification task. Our approach clusters the vectors in the softmax layer to measure distances between cluster centroids and network outputs. We show that a cluster with centroid calculated simply as the mean softmax output for all correct predictions can serve as a suitable proxy in the evaluation of confidence. Defining a distance threshold for a class as the smallest distance from an incorrect prediction to the given class centroid offers a simple approach to adding \textit{not known} answers to any network classification falling outside of the threshold. We evaluate the approach on the MNIST and CIFAR-10 datasets using a Convolutional Neural Network and a Vision Transformer, respectively. The results show that our approach is consistent across datasets and network models, and indicate that the proposed distance metric can offer an efficient way of determining when automated predictions are acceptable and when they should be deferred to human operators.

Explorations of the Softmax Space: Knowing When the Neural Network Doesn't Know

TL;DR

This paper tackles the reliability of neural network decisions under distribution shift by introducing a centroid-based confidence metric in the softmax space. It clusters softmax output vectors, computes class centroids from correct predictions, and uses a distance to these centroids with a data-driven threshold to decide when to return a not-known response. The approach is evaluated on MNIST with a CNN and CIFAR-10 with a Vision Transformer, showing consistent clustering of correct predictions near centroids and increased unknown rejection for out-of-distribution-like data, with thresholding offering a tunable balance between retention and accuracy. The results suggest a practical, low-cost mechanism to signal low-confidence predictions and defer uncertain cases to human operators in safety-critical settings.

Abstract

Ensuring the reliability of automated decision-making based on neural networks will be crucial as Artificial Intelligence systems are deployed more widely in critical situations. This paper proposes a new approach for measuring confidence in the predictions of any neural network that relies on the predictions of a softmax layer. We identify that a high-accuracy trained network may have certain outputs for which there should be low confidence. In such cases, decisions should be deferred and it is more appropriate for the network to provide a \textit{not known} answer to a corresponding classification task. Our approach clusters the vectors in the softmax layer to measure distances between cluster centroids and network outputs. We show that a cluster with centroid calculated simply as the mean softmax output for all correct predictions can serve as a suitable proxy in the evaluation of confidence. Defining a distance threshold for a class as the smallest distance from an incorrect prediction to the given class centroid offers a simple approach to adding \textit{not known} answers to any network classification falling outside of the threshold. We evaluate the approach on the MNIST and CIFAR-10 datasets using a Convolutional Neural Network and a Vision Transformer, respectively. The results show that our approach is consistent across datasets and network models, and indicate that the proposed distance metric can offer an efficient way of determining when automated predictions are acceptable and when they should be deferred to human operators.

Paper Structure

This paper contains 5 sections, 2 equations, 6 figures, 2 tables, 2 algorithms.

Figures (6)

  • Figure 1: Left to right, MNIST Training Data Image ID 8688 digit 6, the network softmax output and the distances to class centroids, correctly classified as 6 and incorrectly clustered as 5. Notice that the y axis is not on logarithmic scale in this case.
  • Figure 2: Left to right, MNIST Training Data Image ID 35537 digit 6, the network softmax output and the distances to class centroids, correctly classified and correctly clustered as 6.
  • Figure 3: Retention, accuracy and correct-incorrect ratio vs Threshold ViT trained on CIFAR-10 and CNN trained on MNIST where the training dataset results are shown. The red plot represents the ratio of correct to incorrect predictions across all classes at given thresholds e.g. for CNN/MNIST at threshold 0.8 the ratio is 64:1, at threshold 0.05 the ratio is 632:1. The green plot is accuracy at every threshold e.g. at threshold 0.8 the accuracy is approximately 98.5%, 64/(64+1), and at threshold 0.05 the accuracy is approximately 100%, 632/(632+1). The blue plot represents the percentage of correct predictions that are being discarded as the threshold decreases e.g. at threshold 0.8 all correct predictions are kept while at threshold 0.05 over 8% of the correct predictions are discarded i.e. classed as unknown.
  • Figure 4: English Handwritten Alphabetic Characters nearest distance and example, and averages
  • Figure 5: Percentage of examples at or above thresholds for distance to predicted class centroid across five datasets, showing higher exclusion rates for English Alphabetical Characters and MNISTified CIFAR-10. Total examples: CIFAR-10 (50,000), MNIST (60,000), Eng. Digits (550), Eng. Alphabetical (2,860), MNISTified CIFAR-10 (50,000).
  • ...and 1 more figures