Explorations of the Softmax Space: Knowing When the Neural Network Doesn't Know
Daniel Sikar, Artur d'Avila Garcez, Tillman Weyde
TL;DR
This paper tackles the reliability of neural network decisions under distribution shift by introducing a centroid-based confidence metric in the softmax space. It clusters softmax output vectors, computes class centroids from correct predictions, and uses a distance to these centroids with a data-driven threshold to decide when to return a not-known response. The approach is evaluated on MNIST with a CNN and CIFAR-10 with a Vision Transformer, showing consistent clustering of correct predictions near centroids and increased unknown rejection for out-of-distribution-like data, with thresholding offering a tunable balance between retention and accuracy. The results suggest a practical, low-cost mechanism to signal low-confidence predictions and defer uncertain cases to human operators in safety-critical settings.
Abstract
Ensuring the reliability of automated decision-making based on neural networks will be crucial as Artificial Intelligence systems are deployed more widely in critical situations. This paper proposes a new approach for measuring confidence in the predictions of any neural network that relies on the predictions of a softmax layer. We identify that a high-accuracy trained network may have certain outputs for which there should be low confidence. In such cases, decisions should be deferred and it is more appropriate for the network to provide a \textit{not known} answer to a corresponding classification task. Our approach clusters the vectors in the softmax layer to measure distances between cluster centroids and network outputs. We show that a cluster with centroid calculated simply as the mean softmax output for all correct predictions can serve as a suitable proxy in the evaluation of confidence. Defining a distance threshold for a class as the smallest distance from an incorrect prediction to the given class centroid offers a simple approach to adding \textit{not known} answers to any network classification falling outside of the threshold. We evaluate the approach on the MNIST and CIFAR-10 datasets using a Convolutional Neural Network and a Vision Transformer, respectively. The results show that our approach is consistent across datasets and network models, and indicate that the proposed distance metric can offer an efficient way of determining when automated predictions are acceptable and when they should be deferred to human operators.
